Li, Jiang
|
20852c8f4c
|
[CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-19 10:32:00 +08:00 |
|
Jerry Zhang
|
da94c7c0eb
|
Move online quantization to model.load_weights (#26327)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-11-18 16:52:41 -08:00 |
|
tomeras91
|
1395461f5f
|
[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-11-18 16:49:36 -08:00 |
|
Varun Sundar Rabindranath
|
9912b8ccb8
|
[Build] Add OpenAI triton_kernels (#28788)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-18 16:45:20 -08:00 |
|
Isotr0py
|
e4bb2684bc
|
[Models] Replace all nn.Conv2d with vLLM's Conv2dLayer (#28842)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 18:56:04 +00:00 |
|
Luciano Martins
|
c2612371ad
|
[Model] Add Gemma3 GGUF multimodal support (#27772)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 08:56:29 -08:00 |
|
Canlin Guo
|
b9489f51e1
|
[Model][Perf] Use cos and sin cache in QwenVL (#28798)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-18 11:51:54 +00:00 |
|
Ning Xie
|
0168f69e50
|
[Misc] Remove unnecessary parentheses from log statements (#28897)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-17 20:33:46 -08:00 |
|
Wentao Ye
|
3ddcf46011
|
[Refactor] Remove Unused Func in Batch Invariant (#28881)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-17 20:29:29 -08:00 |
|
xuebwang-amd
|
d0a73620cc
|
[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss (#28638)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 11:16:45 +08:00 |
|
Pranav
|
f77bce001a
|
[Model] Add Afmoe architecture implementation (#28332)
Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Signed-off-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
|
2025-11-17 15:11:20 -08:00 |
|
Shreyas Kulkarni
|
95ae50b7d1
|
[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435)
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>
|
2025-11-17 15:01:34 -08:00 |
|
Zhewen Li
|
f8b19c0ffd
|
[Bugfix] Fix GPT-OSS on AMD after #28603 (#28816)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-17 13:15:26 -05:00 |
|
wuyaoxuehun
|
ab01cd14e5
|
[BugFix] Fix glm4_moe_mtp load weights bug (#28805)
Signed-off-by: wuyaoxuehun <798143193@qq.com>
|
2025-11-17 17:13:11 +08:00 |
|
jiahanc
|
561253b37f
|
[Performance][Fix] update nvfp4 code to support renorm routing (#28569)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-16 18:02:42 -08:00 |
|
amirkl94
|
03ee48111d
|
Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261)
|
2025-11-16 13:39:44 -05:00 |
|
Lukas Geiger
|
5a87076d6e
|
[Model][QwenVL] Optimize Qwen2_5_VisionAttention q,k preparation (#28769)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-16 17:37:15 +00:00 |
|
Anna Shors
|
8d259fad6c
|
Fix gpt oss weight loading with EP + bf16 (#28765)
Signed-off-by: ashors1 <ashors@nvidia.com>
|
2025-11-16 13:12:45 +00:00 |
|
Dezhan
|
af02c40970
|
Fixed gpt-oss _load_weights_other() parameter position bug (#28715)
Co-authored-by: Dezhan Tu <dztu@meta.com>
|
2025-11-16 09:46:29 +00:00 |
|
Lukas Geiger
|
07cadab27a
|
[Model][Qwen3VL] Cache positional embedding indices (#28475)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-15 19:03:09 +00:00 |
|
Eldar Kurtić
|
e439c784fa
|
Add support for Eagle with separate lm-head and embed_tokens layers (#28549)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2025-11-15 06:12:02 -08:00 |
|
hwhaokun
|
085a525332
|
[Model] Fix lmhead init bug of bailing_moe (#28777)
Signed-off-by: hwhaokun <haokun0405@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-15 05:44:12 -08:00 |
|
tingtinggithub
|
cb15ee28db
|
Allow Gemma3 to take image embeddings (#28483)
Signed-off-by: tingtinggithub <streamttt@gmail.com>
|
2025-11-15 04:18:08 -08:00 |
|
Zhewen Li
|
1ec978c209
|
[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 (#28709)
Signed-off-by: Zhewen Li <zhewenli@meta.com>
|
2025-11-15 01:10:48 -08:00 |
|
Varun Sundar Rabindranath
|
6965ef436f
|
[Performance][DeepGEMM] Estimate expected_m (#28694)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-15 13:52:14 +08:00 |
|
Lukas Geiger
|
f05d474c8a
|
[Model][Qwen3VL] Use mm_position to compute mrope positions (#28730)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-14 19:45:11 -08:00 |
|
Thomas Parnell
|
e0c910bb89
|
[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-11-14 22:55:42 +00:00 |
|
Alexander Matveev
|
e5c78956c0
|
[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-11-14 14:13:46 -08:00 |
|
Andrey Khalyavin
|
fd4555089a
|
[BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728)
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>
|
2025-11-14 10:58:18 -08:00 |
|
GuanH
|
cec275efce
|
[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure (#28663)
Signed-off-by: GuanH <guansdrailib@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-14 18:44:27 +00:00 |
|
TJian
|
a425dc256e
|
[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-11-14 10:30:50 -08:00 |
|
Fardin Hoque
|
964d65deed
|
LLaMA4 LoRA Adapter Enablement (#28602)
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Co-authored-by: Wei Wei <wwei6@meta.com>
|
2025-11-14 13:27:56 -05:00 |
|
Harry Mellor
|
5f3cd7f7f2
|
[Docs] Update the name of Transformers backend -> Transformers modeling backend (#28725)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-14 16:34:14 +00:00 |
|
dongbo910220
|
c934caee88
|
[Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL (#28711)
Signed-off-by: dongbo910220 <1275604947@qq.com>
|
2025-11-14 16:07:20 +00:00 |
|
Duncan Moss
|
3f8a874065
|
[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) (#27134)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-14 08:02:44 -08:00 |
|
zhaozx-cn
|
433c0f8675
|
[Model] Fix bailing_moe accuracy problem (#28277)
Signed-off-by: zhaozx-cn <zhaozx2116@163.com>
|
2025-11-14 13:33:02 +00:00 |
|
Shanshan Shen
|
41b92f7d38
|
[Model][MM] Extract conv layer as CustomOp (#28455)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-14 19:16:13 +08:00 |
|
Jiangyun Zhu
|
c36bcfe6b3
|
[Bugfix] fix dots.ocr pp support (#28705)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-14 09:01:26 +00:00 |
|
haoyangli-amd
|
0b25498990
|
[Misc] add ignore mapper for quark quantization (#28275)
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
|
2025-11-14 05:56:35 +00:00 |
|
Hank_
|
4d5943bda6
|
[quantization][config] enable override existing quant_config (#28510)
Signed-off-by: Hank <hcc.mayday@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-14 01:24:10 +00:00 |
|
Varun Sundar Rabindranath
|
fe1cd7704d
|
[Performance][B200] silu_mul_quant: pack scales in int32 (#28358)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-13 10:16:55 -08:00 |
|
Yuanping Song
|
3035d1a166
|
[BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy path (#28617)
Signed-off-by: Yuanping Song <yuanping.song@outlook.com>
|
2025-11-13 15:24:35 +00:00 |
|
zofia
|
c47b6c85ac
|
[XPU] add sym params to IPEXConfig (#28611)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
|
2025-11-13 11:35:04 +00:00 |
|
Zijing Liu
|
5e973209aa
|
[BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter (#28603)
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
|
2025-11-13 11:30:04 +00:00 |
|
Jiangyun Zhu
|
fa183e9271
|
[Bugfix] fix kimi-linear crash (#28445)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-13 07:59:58 +00:00 |
|
Lucia Fang
|
7e082bc14e
|
Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 (#28574)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-11-12 21:40:45 -08:00 |
|
Harry Mellor
|
97d1c99302
|
Rename clashing method names for vLLM model protocol (#27583)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 19:14:33 -08:00 |
|
wangxiyuan
|
2dacd57394
|
[platform] Move get_cu_count to utils (#27005)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-13 08:48:47 +08:00 |
|
Harry Mellor
|
51c599f0ec
|
Skip models that cannot currently init on Transformers v5 (#28471)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 23:43:57 +00:00 |
|
Alexander Matveev
|
69d0e90313
|
[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-11-12 23:37:24 +00:00 |
|