Cyrus Leung
|
09dc7c690c
|
[Chore][1/2] Drop v0.14 deprecations (#31285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 09:54:01 -08:00 |
|
ゆり
|
506eb0f454
|
[Bugfix] Remove dead block_quant_to_tensor_quant function (#31294)
Co-authored-by: yurekami <yurekami@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2025-12-24 17:22:48 +00:00 |
|
Kevin McKay
|
66c9887440
|
[Bugfix][Hardware][AMD] Fix FP8 dtype in silu_mul quantization (#31179)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-24 10:37:11 -05:00 |
|
Cyrus Leung
|
aa3868ecfe
|
[Chore] Remove unused noqas (#31263)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 05:38:46 -08:00 |
|
wang.yuqi
|
bd89ce16d2
|
[Model] Introduce verify_and_update_model_config for VerifyAndUpdateConfig. (#31131)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-12-24 09:54:57 +00:00 |
|
Pleaplusone
|
b41aeb3468
|
[Bugfix][ROCm] Fix load issue on deepseek quark quantization when shared expert enabled (#31261)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-12-24 16:47:44 +08:00 |
|
Xiong Wang
|
bb24592d13
|
[Qwen3-Omni] fixed _get_feat_extract_output_lengths function (#31007)
Signed-off-by: Xiong Wang <wangxiongts@163.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-12-23 21:33:54 -08:00 |
|
Andreas Karatzas
|
e42894f5b5
|
[ROCm][CI][Bugfix] Fix Siglip2 rotary embedding dispatch and InternVL video test tolerance (#31235)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-24 02:56:58 +00:00 |
|
Wentao Ye
|
76e6a95192
|
[Bug] Fix Number of dimensions of tensors must match. for Deepseek V3.2 (#31160)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-24 10:41:09 +08:00 |
|
Cyrus Leung
|
dd424571c8
|
[Bugfix] Enable dynamic_dims for different embeds shape (#31223)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 10:15:47 +08:00 |
|
Asaf Joseph Gardin
|
34916ae37f
|
[Mamba] - Consolidate Mambas Attention Logic (#28133)
|
2025-12-23 21:57:00 +01:00 |
|
Patrick von Platen
|
3faa8bee57
|
adapt voxtral (#31095)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-12-23 05:31:55 -08:00 |
|
Harry Mellor
|
b10d47e0e0
|
Add util function for checking nesting of rope parameters (#31146)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-23 11:41:49 +00:00 |
|
Jakub Zakrzewski
|
23daef548d
|
[Frontend] Support using chat template as custom score template for reranking models (#30550)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-23 11:19:16 +00:00 |
|
Jee Jee Li
|
6b16fff01b
|
[Bugfix] Fix Jais2ForCausalLM (#31198)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-23 07:44:01 +00:00 |
|
Pavani Majety
|
3e10262356
|
Revert "[SM100] Enable fp8 compute for prefill MLA (#30746)" (#31197)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-12-22 18:15:33 -08:00 |
|
Robert Shaw
|
b57b967386
|
[MoE Refactor][7/N] AITER MK (#31102)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-22 16:42:58 -07:00 |
|
Pavani Majety
|
b10f41c894
|
[SM100] Enable fp8 compute for prefill MLA (#30746)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-12-22 19:15:57 +00:00 |
|
Yongye Zhu
|
7b926e8901
|
[MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE (#31052)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-12-22 17:34:19 +00:00 |
|
dengyunyang
|
8f8f469b1b
|
[BugFix] skip language model in Encoder (#30242)
Signed-off-by: dengyunyang <584797741@qq.com>
|
2025-12-22 05:25:59 -08:00 |
|
Li Wang
|
256a33ecb4
|
[Model] Fix bagel failed to run (#31132)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-12-22 02:15:54 -08:00 |
|
Kevin McKay
|
cf8eed7bef
|
[Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled (#31109)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2025-12-21 21:14:58 -08:00 |
|
Kevin McKay
|
14c3e6ade3
|
[Misc] Fix spelling typos in model comments (#31117)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:14:14 -08:00 |
|
CedricHuang
|
19cc9468fd
|
[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM (#30957)
|
2025-12-21 22:34:49 -05:00 |
|
Robert Shaw
|
b471092d3a
|
[MoE Refactor][4/N] Marlin Fp8 Mk (#31036)
|
2025-12-21 12:37:42 -05:00 |
|
Jinzhen Lin
|
7c73ceb581
|
[Quantization] add marlin w4a8/w8a8 check (#31061)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-20 21:58:11 +00:00 |
|
Jinzhen Lin
|
ee52d9901d
|
[Quantization] support logical_widths for fp8 marlin (#30962)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-20 12:02:57 -08:00 |
|
baonudesifeizhai
|
54c8924384
|
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash (#28891)
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
|
2025-12-20 18:22:04 +00:00 |
|
Yan Ma
|
560ae9638c
|
[XPU] enable fp8 online streaming quantization (#30944)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-12-20 13:45:27 +00:00 |
|
Yuxuan Zhang
|
8a7a414374
|
GLM-4.7 Tool Parser and Doc Update (#30876)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-12-20 00:09:58 +00:00 |
|
Robert Shaw
|
95befecc18
|
[MoE Refactor][2/N] Use Modular Kernels for Fp8 (#30825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 23:36:38 +00:00 |
|
Wentao Ye
|
4cf9429897
|
[Bug] Fix error 'Dynamo failed to run FX node with fake tensors for Deepseek V3.2 (#31046)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-19 23:31:31 +00:00 |
|
Robert Shaw
|
83a317f650
|
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 13:09:54 -08:00 |
|
Wentao Ye
|
3bd8335bd0
|
[Refactor] Refactor for DeepGemmQuantScaleFMT using cache (#30898)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-19 13:50:39 -07:00 |
|
Zhonghua Deng
|
969bbc7c61
|
[Model] Add MiMo-V2-Flash support (#30836)
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-19 17:17:03 +00:00 |
|
Jinzhen Lin
|
5fbfa8d9ef
|
[Quantization] fix marlin w8a8 check (#30961)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-19 07:33:22 -08:00 |
|
Shanshan Shen
|
23a1946e3b
|
[CustomOp][Refactor] Extract common methods for ApplyRotaryEmb CustomOp (#31021)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-12-19 22:16:09 +08:00 |
|
Jinzhen Lin
|
9187de9fac
|
[Quantization] enable compressed-tensors marlin support for turing (2) (#31008)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-19 08:56:35 +00:00 |
|
Jinzhen Lin
|
de08b8f61b
|
[Quantization] enable compressed-tensors marlin support for turing (#31000)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-18 20:29:48 -08:00 |
|
Andreas Karatzas
|
7b43db210c
|
[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements (#30270)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-19 02:17:27 +00:00 |
|
Wentao Ye
|
97000a2be7
|
[Bug] Fix compressed tensor not using deepgemm (#30820)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-18 14:45:55 -05:00 |
|
navmarri14
|
b8c477c115
|
tuned fused configs for B300 (#30629)
|
2025-12-18 11:41:59 -08:00 |
|
jiahanc
|
53ad423f26
|
[Perf] enable flashinfer rotary_embedding custom ops in DeepSeek rotary (#30729)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-12-18 14:31:18 -05:00 |
|
Isotr0py
|
700a5ad6c6
|
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface (#30684)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-19 02:04:19 +08:00 |
|
Vasiliy Kuznetsov
|
f4ee2c3d90
|
fix fp8 online quantization streaming with tp > 1 (#30900)
Signed-off-by: vasiliy <vasiliy@fb.com>
|
2025-12-18 11:45:15 -05:00 |
|
Xin Yang
|
9a5e96523b
|
[LoRA] Set default MXFP4 LoRA backend to Marlin (#30598)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 08:42:22 -08:00 |
|
sarathc-cerebras
|
28d15ab56b
|
adds jais 2 support (#30188)
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-12-18 15:46:58 +00:00 |
|
Wentao Ye
|
6628758233
|
[Bug] Fix batch invariant in torch 2.10 (#30907)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 07:27:51 -08:00 |
|
Michael Goin
|
100f93d2be
|
Filter safetensors files to download if .safetensors.index.json exists (#30537)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-18 14:51:17 +00:00 |
|
Ming Yang
|
8372be2828
|
[moe] Use enable_chunking func (to support disabling chunking) (#29935)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-12-18 09:02:38 +00:00 |
|