Lukas Geiger
07cadab27a
[Model][Qwen3VL] Cache positional embedding indices ( #28475 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-15 19:03:09 +00:00
Eldar Kurtić
e439c784fa
Add support for Eagle with separate lm-head and embed_tokens layers ( #28549 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
2025-11-15 06:12:02 -08:00
hwhaokun
085a525332
[Model] Fix lmhead init bug of bailing_moe ( #28777 )
...
Signed-off-by: hwhaokun <haokun0405@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-15 05:44:12 -08:00
tingtinggithub
cb15ee28db
Allow Gemma3 to take image embeddings ( #28483 )
...
Signed-off-by: tingtinggithub <streamttt@gmail.com>
2025-11-15 04:18:08 -08:00
Lukas Geiger
f05d474c8a
[Model][Qwen3VL] Use mm_position to compute mrope positions ( #28730 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 19:45:11 -08:00
GuanH
cec275efce
[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure ( #28663 )
...
Signed-off-by: GuanH <guansdrailib@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-14 18:44:27 +00:00
Fardin Hoque
964d65deed
LLaMA4 LoRA Adapter Enablement ( #28602 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Co-authored-by: Wei Wei <wwei6@meta.com>
2025-11-14 13:27:56 -05:00
Harry Mellor
5f3cd7f7f2
[Docs] Update the name of Transformers backend -> Transformers modeling backend ( #28725 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 16:34:14 +00:00
dongbo910220
c934caee88
[Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL ( #28711 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
2025-11-14 16:07:20 +00:00
zhaozx-cn
433c0f8675
[Model] Fix bailing_moe accuracy problem ( #28277 )
...
Signed-off-by: zhaozx-cn <zhaozx2116@163.com>
2025-11-14 13:33:02 +00:00
Shanshan Shen
41b92f7d38
[Model][MM] Extract conv layer as CustomOp ( #28455 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-14 19:16:13 +08:00
Jiangyun Zhu
c36bcfe6b3
[Bugfix] fix dots.ocr pp support ( #28705 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-14 09:01:26 +00:00
Yuanping Song
3035d1a166
[BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy path ( #28617 )
...
Signed-off-by: Yuanping Song <yuanping.song@outlook.com>
2025-11-13 15:24:35 +00:00
Harry Mellor
97d1c99302
Rename clashing method names for vLLM model protocol ( #27583 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 19:14:33 -08:00
Harry Mellor
51c599f0ec
Skip models that cannot currently init on Transformers v5 ( #28471 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 23:43:57 +00:00
Canlin Guo
bc5bd45c7d
[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL ( #28271 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2025-11-12 15:56:47 +00:00
Jee Jee Li
a9d18b5107
[Bugfix] Fix gpt_oss packed_modules_mapping ( #28536 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-12 21:02:06 +08:00
wuyaoxuehun
d3ade61e42
[Model] fix glm4_moe_mtp load weights with GLM-4.6 checkpoint. ( #27597 )
...
Signed-off-by: wuao.scotty <wuao.scotty@bytedance.com>
Co-authored-by: wuao.scotty <wuao.scotty@bytedance.com>
2025-11-12 10:14:00 +00:00
yyzxw
1761dea1a8
[BugFix]: --enable-lora with model granite-4.0-micro crash ( #27733 )
...
Signed-off-by: zxw <1020938856@qq.com>
2025-11-12 09:03:56 +00:00
Fanli Lin
b9ce9a3013
[BugFix] Add fallback path in apply_rotary_pos_emb_flashattn for non-cuda platforms ( #28447 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-11-12 03:13:21 +00:00
Lukas Geiger
cbb799e314
[Model][Qwen3VL] Simplify get_mrope_input_positions using numpy ( #28302 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-12 02:55:10 +00:00
Jee Jee Li
9d1c474704
[LoRA][1/N]Remove LoRA extra vocab ( #28382 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-11 11:06:21 -08:00
Fanli Lin
d5edcb8678
[BugFix] Fix Siglip2Attention on XPU ( #28448 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-11-11 18:18:02 +00:00
xuebwang-amd
5a1271d83a
[Quantization] fix attention quantization of gpt_oss model ( #27334 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
2025-11-11 12:06:00 -05:00
Fanli Lin
b886068056
[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU ( #28444 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-11-11 15:29:33 +00:00
Cyrus Leung
afffd3cc8a
[Model] Pass mm_features directly into get_mrope_input_positions ( #28399 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-11 21:14:48 +08:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic ( #24794 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
Lukas Geiger
9973e6e04a
[Model][Qwen3VL] Slighly speedup fast_pos_embed_interpolate ( #28434 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-11 10:35:10 +00:00
Fanli Lin
c7991269dd
[BugFix] 'DeepseekV2Config' object has no attribute 'use_mla'` ( #28387 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-11-11 08:45:38 +00:00
Jiangyun Zhu
f0359fffa4
[Bugfix] fix qwen3-next crash ( #28202 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-11 08:24:28 +00:00
Roger Wang
4fd4b743a2
[Bugfix] Fix max image size for PaddleOCR-VL ( #28442 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-11 08:07:24 +00:00
jiahanc
34553b9d27
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next ( #27492 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-11-10 12:34:57 -05:00
Cyrus Leung
d0e186c16f
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE ( #28395 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-11 00:30:06 +08:00
vllmellm
f080a83511
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops ( #24490 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-10 08:20:53 -08:00
Ferrebo
912744d066
[Fix] optimize visual token mask with caching and multi-token support ( #28374 )
...
Signed-off-by: Ferrebo <itachi971009@gmail.com>
Signed-off-by: kebo01 <kebo01@baidu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 13:23:49 +00:00
Yu Jiaqi
15be507c86
[bugfix] fix siglip batch text output error ( #28365 )
...
Signed-off-by: piood <2477084691@qq.com>
2025-11-10 21:21:15 +08:00
Jiangyun Zhu
c4768dcf47
[Kernel] Fix fused_gdn_gating ( #28343 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-09 14:26:35 -07:00
Jiangyun Zhu
7ae5a5fb11
[Misc] Add some comments in qwen3-next ( #28267 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-08 23:59:24 -08:00
Mohammad Miadh Angkad
404d7a9d14
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 ( #28345 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
2025-11-08 15:50:10 -07:00
Isotr0py
934a9c3b79
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 ( #28101 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-08 05:01:27 +00:00
Lukas Geiger
e0919f331d
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU ( #28168 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-07 12:14:29 +00:00
Kevin H. Luu
8e19d470af
[fix] Revert "fixing mm placeholder replacement issue with gemma3" ( #28285 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-11-07 12:09:09 +00:00
Mengqing Cao
1958bda9b4
[Misc][Model][Refactor] Pass the prefix into Linear layers ( #28259 )
...
Signed-off-by: MengqingCao <cmq0113@163.com>
2025-11-07 19:38:38 +08:00
Harry Mellor
c0a4b95d64
Fix issues from #28242 ( #28257 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-07 04:23:17 +00:00
Lucas Kabela
4bf56c79cc
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile ( #28242 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2025-11-07 00:16:03 +00:00
Julien Denize
7a8375f8a0
Add llama 4 scaling support ( #28145 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
2025-11-06 18:55:17 +00:00
Seungduk Kim
201dc98acc
Fix hard-coded parameter name in gemma3n.py ( #27946 )
...
Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2025-11-05 23:07:36 -08:00
Isotr0py
43ecd0a900
[Chore] Clean up deepseek v2/v3 config copy ( #28055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-06 03:46:30 +00:00
Vadim Gimpelson
b6a248bdd7
[PERF] Decouple projections from GDN custom op. Attempt 2 ( #28083 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-11-05 17:01:12 -08:00
wang.yuqi
802748bddb
[Bugfix] Fix Qwen3-Reranker-8B load ( #28117 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-11-05 18:33:50 +00:00