Cyrus Leung
0e741c12e3
[Bugfix] Fix Plamo3 rope handling ( #29092 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-21 11:38:35 +08:00
Jee Jee Li
9875be6431
[LoRA][2/2]Remove LoRA extra vocab ( #28545 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-21 09:46:43 +08:00
Fanli Lin
a2e9ebe9e2
[BugFix] Fix flash_attn import in siglip2navit.py ( #29082 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-11-20 12:14:29 +00:00
Zhewen Li
93c8672ceb
[Bugfix] Fix spec decode memory regression after #28549 ( #28819 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-20 19:05:50 +08:00
Shinichi Hemmi
c9e093116c
[MODEL] Implement plamo3 ( #28834 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
2025-11-20 03:00:19 -08:00
Pleaplusone
06c20c9904
[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA ( #26670 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-20 02:54:01 -08:00
Anna Shors
6eb745d9bd
Add truncate arg to yarn to match openai implementation of gpt-oss ( #28244 )
...
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-11-20 18:53:50 +08:00
Dezhan
dc45efc8ef
[BugFix] Fix Llama4 Pipeline Parallelism Assert Error ( #28577 )
...
Co-authored-by: Dezhan Tu <dztu@meta.com>
2025-11-20 02:52:36 -08:00
Pleaplusone
7218f83992
[ROCm][BugFix] Fix shared expert loading error when disable VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS ( #28633 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-20 14:50:23 +07:00
Lukas Geiger
a9705a290a
[Model][QwenVL] Replace torch.repeat_interleave with faster np.repeat ( #28964 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-19 22:04:23 -08:00
Isotr0py
64192d5624
[Bugfix] Revert custom attention mask for gemma3-mm ( #28995 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 13:23:22 +08:00
Wentao Ye
5031cd5d55
[Refactor] Optimize select_experts ( #28069 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-19 18:53:15 -05:00
JartX
8e38e99829
[Feature] EPLB on Qwen3VLMoe and CompressedTensorsWNA16MoEMethod ( #28849 )
2025-11-19 18:30:08 -05:00
Wentao Ye
0075bfffd4
[CI] Fix precommit rope_theta issue ( #29040 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-19 14:22:43 -08:00
Yongye Zhu
88f5b19f0b
[DeepSeek] Fix DeepSeek V3.2 Rope Embedding ( #28968 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-11-19 16:30:04 -05:00
Qiu
2fd893b4ce
[Feature] Prefill Context Parallel (PCP) basic support ( #28718 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
2025-11-19 15:52:44 -05:00
Izzy Putterman
02f5903b84
Eagle: MM Cuda Graphs with MRope ( #28896 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-19 15:01:05 -05:00
Yuxuan Zhang
0c80efd94f
GLM-V video segmentation solution adjustment ( #28941 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
2025-11-19 17:32:55 +00:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 ( #28542 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
Shanshan Shen
d44e9df7d4
[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device ( #26487 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-11-19 16:24:55 +00:00
Harry Mellor
4f5299f717
Relax Transformers modeling backend MoE experts check ( #28952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 21:50:30 +08:00
Lukas Geiger
3d4e7d34be
[Model][QwenVL] Simplify cos/sin rotary embedding indexing ( #28962 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-19 05:43:01 +00:00
Gleb Kurchanov
73ff872db0
[Bugfix] Fix typo in Qwen3 Next model executor ( #28960 )
...
Signed-off-by: Gleb Kurchanov <nepherpitou@gmail.com>
2025-11-19 05:21:02 +00:00
Jerry Zhang
da94c7c0eb
Move online quantization to model.load_weights ( #26327 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-11-18 16:52:41 -08:00
tomeras91
1395461f5f
[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op ( #28587 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-11-18 16:49:36 -08:00
Isotr0py
e4bb2684bc
[Models] Replace all nn.Conv2d with vLLM's Conv2dLayer ( #28842 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 18:56:04 +00:00
Luciano Martins
c2612371ad
[Model] Add Gemma3 GGUF multimodal support ( #27772 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 08:56:29 -08:00
Canlin Guo
b9489f51e1
[Model][Perf] Use cos and sin cache in QwenVL ( #28798 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2025-11-18 11:51:54 +00:00
Ning Xie
0168f69e50
[Misc] Remove unnecessary parentheses from log statements ( #28897 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-11-17 20:33:46 -08:00
Pranav
f77bce001a
[Model] Add Afmoe architecture implementation ( #28332 )
...
Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Signed-off-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
2025-11-17 15:11:20 -08:00
Shreyas Kulkarni
95ae50b7d1
[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle ( #28435 )
...
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>
2025-11-17 15:01:34 -08:00
wuyaoxuehun
ab01cd14e5
[BugFix] Fix glm4_moe_mtp load weights bug ( #28805 )
...
Signed-off-by: wuyaoxuehun <798143193@qq.com>
2025-11-17 17:13:11 +08:00
Lukas Geiger
5a87076d6e
[Model][QwenVL] Optimize Qwen2_5_VisionAttention q,k preparation ( #28769 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-16 17:37:15 +00:00
Anna Shors
8d259fad6c
Fix gpt oss weight loading with EP + bf16 ( #28765 )
...
Signed-off-by: ashors1 <ashors@nvidia.com>
2025-11-16 13:12:45 +00:00
Dezhan
af02c40970
Fixed gpt-oss _load_weights_other() parameter position bug ( #28715 )
...
Co-authored-by: Dezhan Tu <dztu@meta.com>
2025-11-16 09:46:29 +00:00
Lukas Geiger
07cadab27a
[Model][Qwen3VL] Cache positional embedding indices ( #28475 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-15 19:03:09 +00:00
Eldar Kurtić
e439c784fa
Add support for Eagle with separate lm-head and embed_tokens layers ( #28549 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
2025-11-15 06:12:02 -08:00
hwhaokun
085a525332
[Model] Fix lmhead init bug of bailing_moe ( #28777 )
...
Signed-off-by: hwhaokun <haokun0405@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-15 05:44:12 -08:00
tingtinggithub
cb15ee28db
Allow Gemma3 to take image embeddings ( #28483 )
...
Signed-off-by: tingtinggithub <streamttt@gmail.com>
2025-11-15 04:18:08 -08:00
Lukas Geiger
f05d474c8a
[Model][Qwen3VL] Use mm_position to compute mrope positions ( #28730 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 19:45:11 -08:00
GuanH
cec275efce
[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure ( #28663 )
...
Signed-off-by: GuanH <guansdrailib@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-14 18:44:27 +00:00
Fardin Hoque
964d65deed
LLaMA4 LoRA Adapter Enablement ( #28602 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Co-authored-by: Wei Wei <wwei6@meta.com>
2025-11-14 13:27:56 -05:00
Harry Mellor
5f3cd7f7f2
[Docs] Update the name of Transformers backend -> Transformers modeling backend ( #28725 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 16:34:14 +00:00
dongbo910220
c934caee88
[Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL ( #28711 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
2025-11-14 16:07:20 +00:00
zhaozx-cn
433c0f8675
[Model] Fix bailing_moe accuracy problem ( #28277 )
...
Signed-off-by: zhaozx-cn <zhaozx2116@163.com>
2025-11-14 13:33:02 +00:00
Shanshan Shen
41b92f7d38
[Model][MM] Extract conv layer as CustomOp ( #28455 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-14 19:16:13 +08:00
Jiangyun Zhu
c36bcfe6b3
[Bugfix] fix dots.ocr pp support ( #28705 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-14 09:01:26 +00:00
Yuanping Song
3035d1a166
[BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy path ( #28617 )
...
Signed-off-by: Yuanping Song <yuanping.song@outlook.com>
2025-11-13 15:24:35 +00:00
Harry Mellor
97d1c99302
Rename clashing method names for vLLM model protocol ( #27583 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 19:14:33 -08:00
Harry Mellor
51c599f0ec
Skip models that cannot currently init on Transformers v5 ( #28471 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 23:43:57 +00:00