Varun Sundar Rabindranath
269c4db0a4
[Misc][DP] Guard mxfp4 implementation selection ( #27484 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-10-24 23:29:24 +00:00
Wentao Ye
52efc34ebf
[Log] Optimize Startup Log ( #26740 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-24 19:27:04 -04:00
Isotr0py
acc78aeb88
[Bugfix] Fix interns1-vit qk norm code path ( #27480 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-24 17:43:45 +00:00
fhl2000
284cc92275
[MISC] cudagraph_capture_sizes related improvements ( #26016 )
...
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-24 05:11:05 -07:00
Isotr0py
42efe609ba
[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer ( #27418 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-24 07:32:47 +00:00
Xiangyu Li
5cc6bddb6e
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm ( #26092 )
2025-10-23 23:26:13 -04:00
Harry Mellor
1f9460c4c1
Fix pooling adapters for Transformers backend ( #27338 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-23 20:23:55 -07:00
xiao-llm
70022ffc00
Granite 4.0 quark quantization support ( #26944 )
...
Signed-off-by: Xiao YU <Xiao.YU@xilinx.com>
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao YU <Xiao.YU@xilinx.com>
2025-10-24 02:14:03 +00:00
Akash kaothalkar
f417746ad7
[Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc ( #27422 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
2025-10-23 21:21:36 +00:00
Yu Jiaqi
0552cfb195
[Model] Siglip Embedding Support ( #27324 )
...
Signed-off-by: piood <2477084691@qq.com>
2025-10-23 20:19:48 +00:00
Jonathan Chen
ca76486a16
[Chore] Separate out vllm.utils.platform_utils.py ( #27374 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com>
2025-10-23 19:08:06 +00:00
Isotr0py
81d5bb765a
[Bugfix] Fix AWQ marlin layer skipping ( #27416 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-23 18:30:28 +00:00
Gregory Shtrasberg
0825197bee
[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek ( #27373 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-10-23 17:43:53 +00:00
Alexander Matveev
9ef3d5b875
[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer ( #27220 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-10-24 00:03:14 +08:00
wang.yuqi
3fa2c12185
[Frontend][4/N] Improve all pooling task | Add plugin pooling task ( #26973 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
2025-10-23 14:46:18 +00:00
Cyrus Leung
fe2016de2d
[CI/Build] Remove unnecessary flags from test registry ( #27353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-23 14:42:40 +00:00
Bradley D
570c3e1cd4
[Bugfix] Honor --mm_encoder_attn_backend when used ( #27124 )
...
Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-23 20:09:52 +08:00
tomeras91
61089465a6
[Model] Add MoE support for NemotronH ( #25863 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-10-23 10:27:23 +00:00
Isotr0py
2566dca2a9
[Bugfix] Fix deepseek-ocr multi-image inference and add merge_by_field_config=True with tensor schema support ( #27361 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-22 17:15:38 -07:00
Luciano Martins
e05a6754a8
[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… ( #27309 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
2025-10-22 10:05:34 -07:00
Isotr0py
db6f28d898
[Bugfix] Fix HF format InternVL large variants video processing ( #27330 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-22 08:39:23 -07:00
Cyrus Leung
14e2f1231e
[Bugfix] Make get_mrope_input_positions instance methods ( #27342 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-22 08:38:34 -07:00
Reinforce-II
980de31ca0
[bugfix] remove unused parameters to reduce unnecessary vram usage ( #26789 )
...
Signed-off-by: Reinforce-II <fate@eastal.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-22 08:16:09 -07:00
Wentao Ye
1c160841ea
[Bug] Fix DeepSeek-V2.5-1210-FP8 issue ( #27267 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-22 11:00:10 -04:00
Isotr0py
675aa2ec64
[Model] Upstream Deepseek-OCR model ( #27247 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-22 07:59:15 -07:00
dongbo910220
3ae082c373
[Chore] Separate out optional dependency checks from vllm.utils ( #27207 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-22 10:44:21 -04:00
Jiangyun Zhu
ab3e80042e
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled ( #27146 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-10-22 00:22:39 -04:00
Lain
09a7e6f617
[Deepseek v3.2] Remove extra logics in indexer ( #26465 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Lain <siyuanf@nvidia.com>
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-10-21 23:34:03 +00:00
Alexander Matveev
344a0017c0
[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE ( #26440 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-10-21 21:38:29 +00:00
Wentao Ye
86ed77022d
[Feature] Batch Invariant for R1 TP 8 on Blackwell ( #27229 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-21 10:25:55 -07:00
JartX
ba09652de2
[ROCM] Enable CompressedTensorsWNA16 ( #27187 )
...
Signed-off-by: JartX <sagformas@epdcenter.es>
2025-10-21 10:43:23 -04:00
Daniel Cámpora
80e9452984
[Deepseek v3.2] Optimize top_k_per_row ( #26763 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-10-21 08:30:07 +00:00
Roger Wang
c3a2c6ac5f
[MM][Core] Decouple ViT backend from LM backend ( #27061 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-10-21 00:30:10 -07:00
Zebing Lin
be4445072c
[Fix][Spec Decode] Fix llama4 draft loading with different quantization ( #27136 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com>
2025-10-20 23:19:00 -07:00
Benjamin Chislett
f381cf2302
[Bugfix] Fix broken MTP weight loading for FP8 KV Scales ( #27227 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-10-20 22:51:44 -07:00
Varun Sundar Rabindranath
5ff5d94e77
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 ( #26729 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-21 01:51:14 -04:00
Shu Wang
f95da13c3d
[ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 ( #26135 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-21 01:50:31 -04:00
Po-Han Huang (NVIDIA)
aef368aa08
[BugFix] GPT-OSS Attention DP + MoE TP weight loading issue ( #24032 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-10-21 04:03:47 +00:00
Chen Wu
5f6cbf60d6
[Feature][Kernel]FusedMoE LoRA ( #21229 )
...
Signed-off-by: wuchen <cntryroa@gmail.com>
Signed-off-by: banjuede <lmklhc@163.com>
Signed-off-by: Chen Wu <cntryroa@gmail.com>
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: wuchen <wuchen@zetyun.com>
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com>
Co-authored-by: banjuede <lmklhc@163.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
2025-10-21 03:01:37 +00:00
Fadi Arafeh
163965d183
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 ( #27183 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Michael Yang <Michael.Yang@arm.com>
2025-10-21 02:02:58 +00:00
Isotr0py
352c0c8a28
[Quantization] Automatically infer AWQ modules_to_not_convert field ( #26909 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-21 01:49:28 +00:00
Heng Guo
87778d5f00
[Feature][Quantization] auto_round support for mixed bits quantization ( #23812 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: Heng Guo <heng.guo@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-20 22:23:30 +00:00
shivampr
4d0f266113
[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) ( #26268 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com>
2025-10-20 07:48:01 -07:00
Eugene Khvedchenya
e93ff6c8b9
Nemotron Nano V2 VL + EVS Video Support ( #27107 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Natan Bagrov <nbagrov@nvidia.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-20 22:19:11 +08:00
Jiangyun Zhu
9fce7bee74
[Kernel] Accelerate solve_tril with TMA ( #26746 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-10-20 05:39:02 +00:00
Yi Zhang
f32bf7582e
[Model][VLM] Support Bee-8B Model ( #27012 )
...
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com>
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-20 02:31:26 +00:00
Cyrus Leung
d31f7844f8
[Misc] Move utils to avoid conflicts with stdlib, and move tests ( #27169 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-19 05:20:55 -07:00
Jianyu Huang
221bf72577
output type conversion fix ( #27159 )
2025-10-19 08:10:07 +00:00
Lucas Wilkinson
c2bba69065
[BugFix] Disable fp8 kv-cache by default for DeepSeek V3.2 ( #27121 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 22:05:23 +00:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-10-18 09:48:22 -07:00