elvischenv
6330f9477d
[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-11-25 07:59:40 +00:00
Fadi Arafeh
98caeadd54
[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei ( #29273 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-25 15:11:11 +08:00
Isotr0py
92effb07a4
[Model] Add HunyuanOCR support ( #29327 )
...
Signed-off-by: manayang <jackmanayang@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: sergeywang <sergeywang@tencent.com>
Co-authored-by: manayang <jackmanayang@gmail.com>
Co-authored-by: manayang <manayang@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-25 03:28:51 +00:00
Michael Goin
6f1355a1b7
[Perf] Disable DeepGEMM MoE by default when TP=8 is used ( #29346 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-24 19:01:40 -07:00
Hanjie Qiu
5f9679a43b
[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states ( #27688 )
...
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-24 20:13:12 -05:00
Wentao Ye
699bca76c0
[UX] Raise error for attn backend of batch invariant ( #29348 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-24 17:49:01 -07:00
Michael Goin
c17610e2ba
[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 ( #29339 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-24 18:22:46 -05:00
Yan Ma
3cfa63ad99
[XPU]fix Kimi-VL-A3B-thinking on xpu ( #29309 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-11-24 21:02:21 +00:00
Chenheli Hua
839c6b7b72
[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. ( #27721 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-24 19:24:37 +00:00
bnellnm
8f066146c3
[MoE][Refactor] Make select_experts a non-static method ( #29067 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-11-24 13:38:04 -05:00
Laith Sakka
7a228b5305
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. ( #26199 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-11-24 10:12:41 -05:00
杰兮
8005e606bf
[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP ( #27563 )
...
Signed-off-by: zhyajie <yajizhan@amd.com>
Co-authored-by: zhyajie <yajizhan@amd.com>
2025-11-24 10:16:52 +00:00
Roger Wang
0ff70821c9
[Core] Deprecate xformers ( #29262 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-24 04:18:55 +00:00
Zero
30854783ad
[Model] Add OpenCUA-7B support ( #29068 )
...
Signed-off-by: lim4349 <rockmanzero@naver.com>
Signed-off-by: Zero <rockmanzero@naver.com>
Co-authored-by: Cloud User <ubuntu@a100-80g-4.novalocal>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-24 10:27:55 +08:00
Jee Jee Li
1073ba68b0
[LoRA] Optimize 3D MoE logic ( #29222 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-24 10:27:23 +08:00
jiahanc
5f96c00c55
[Fix] Add SM check to flashinfer MOE backend ( #29144 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-23 00:39:30 +00:00
Federico
f55c76c2b3
chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning ( #29240 )
2025-11-22 08:42:48 -08:00
ZiTian Zhao
d84d8f4429
Fix EVS crash when using video_embeds inputs in Qwen2.5-VL ( #29232 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-22 06:48:59 -08:00
Cyrus Leung
ae66818379
[Misc] Fix pre-commit ( #29238 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-22 06:48:01 -08:00
Bram Wasti
5f7209a793
[tiny] Remove unsupported TRITON_MLA backend from batch invariance ( #28832 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-22 21:00:50 +08:00
Nandan Vallamdasu
6965a392a4
Fix: Resolve circular import in model_loader/utils.py ( #29189 )
...
Signed-off-by: nandan2003 <nandan.vallamdasu@outlook.com>
Signed-off-by: Nandan Vallamdasu <nandan.vallamdasu@outlook.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-22 04:58:22 -08:00
jinghanhu
988ee66b0d
Handle triton kernel import exception ( #29062 )
2025-11-22 10:07:50 +00:00
FlintyLemming
052950e5b3
Add fused MoE config for H200 E160 N192 fp8 ( #29182 )
...
Signed-off-by: FlintyLemming <admin@flinty.moe>
2025-11-21 17:37:51 -08:00
Lukas Geiger
d045e22dfe
[Model][Qwen3VL] Tune Triton w8a8 block fp8 kernel for L40s ( #29217 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-21 17:30:55 -08:00
Varun Sundar Rabindranath
3137991f55
[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor ( #29162 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-21 14:28:17 -08:00
Julien Denize
57430fc95c
Default model load/config/tokenizer to mistral format if relevant files exist ( #28659 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-21 13:58:59 -08:00
Ning Xie
53a1ba6ec5
[log] add weights loading time log to sharded_state loader ( #28628 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-11-21 21:06:09 +00:00
Lucas Wilkinson
1840c5cb18
[BugFix] Make sure to allocate worst case MoE workspace during profile run in the DP + EP case ( #27426 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-21 11:41:52 -08:00
Mingyuan Ma
b4c8fbaae2
Add TRTLLM MoE NVFP4 kernel to CompressedTensorsW4A4MoeMethod ( #28892 )
...
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-21 09:54:11 -07:00
rasmith
e99e467384
[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py ( #29132 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-21 11:53:09 -05:00
Wentao Ye
a42ab317ac
[Log] Optimize startup log ( #28948 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-11-21 08:46:20 -08:00
Aleksandr Malyshev
b7f1f490a6
Upstream triton fp4 weight preshuffle ( #28888 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-11-21 11:34:46 -05:00
Russell Bryant
cca2d2cdbe
[Core] Align whisper closer to other multimodal models ( #27292 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-11-21 12:01:54 +00:00
Cyrus Leung
aab0102a26
[V0 deprecation] Remove more V0 references ( #29088 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:56:59 +00:00
Huamin Li
8ac3a41487
[CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers ( #29111 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 23:53:30 -08:00
Cyrus Leung
0e741c12e3
[Bugfix] Fix Plamo3 rope handling ( #29092 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-21 11:38:35 +08:00
Wentao Ye
56669c1f29
[CI] Fix mypy for vllm/v1/worker ( #29037 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-21 11:36:07 +08:00
Hongxia Yang
3f5f36da3f
[ROCm] Fix for import when building with upstream triton for gfx1100 for gpt-oss serving ( #29127 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
2025-11-21 03:30:07 +00:00
Wentao Ye
e1eefa4c40
[Bug] Fix torch warning of tf32 usage ( #29112 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-21 01:54:59 +00:00
Jee Jee Li
9875be6431
[LoRA][2/2]Remove LoRA extra vocab ( #28545 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-21 09:46:43 +08:00
Wentao Ye
df44df0143
[Feature] Shared Experts Overlap with FI deepgemm swap kernel, 2.2% throughput improvement and 3.6% TTFT improvement ( #28879 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-20 18:41:49 -07:00
Fanli Lin
a2e9ebe9e2
[BugFix] Fix flash_attn import in siglip2navit.py ( #29082 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-11-20 12:14:29 +00:00
Zhewen Li
93c8672ceb
[Bugfix] Fix spec decode memory regression after #28549 ( #28819 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-20 19:05:50 +08:00
Shinichi Hemmi
c9e093116c
[MODEL] Implement plamo3 ( #28834 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
2025-11-20 03:00:19 -08:00
Pleaplusone
06c20c9904
[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA ( #26670 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-20 02:54:01 -08:00
Anna Shors
6eb745d9bd
Add truncate arg to yarn to match openai implementation of gpt-oss ( #28244 )
...
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-11-20 18:53:50 +08:00
Dezhan
dc45efc8ef
[BugFix] Fix Llama4 Pipeline Parallelism Assert Error ( #28577 )
...
Co-authored-by: Dezhan Tu <dztu@meta.com>
2025-11-20 02:52:36 -08:00
Wentao Ye
2c52c7fd9a
[Bug] Fix torch dynamo warning Dynamo detected a call to a functools.lru_cache ( #29038 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-20 16:52:23 +08:00
Pleaplusone
7218f83992
[ROCm][BugFix] Fix shared expert loading error when disable VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS ( #28633 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-20 14:50:23 +07:00
Lukas Geiger
a9705a290a
[Model][QwenVL] Replace torch.repeat_interleave with faster np.repeat ( #28964 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-19 22:04:23 -08:00