2818 Commits

Author SHA1 Message Date
Cyrus Leung
d346ec695e
[CI/Build] Consolidate model loader tests and requirements (#25765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-26 21:45:20 -07:00
Wentao Ye
c242c98031
[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (#25788) 2025-09-26 20:44:52 -07:00
WeiQing Chen
f1d53d150c
[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
2025-09-27 03:35:47 +00:00
Bram Wasti
dc48ba0c75
Kernel-override Determinism [1/n] (#25603)
Signed-off-by: Bram Wasti <bwasti@meta.com>
2025-09-26 16:59:09 -07:00
Michael Goin
f708bd4904
[CI] Add E2E Blackwell Quantized MoE Test (#25723)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-26 12:23:00 -07:00
阿丹(adan)
33f6aaf972
Eagle3 that supports the Minicpm3 model (#24243)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: liudan <adan@minicpm.com>
Co-authored-by: liudan <liudan@qq.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
2025-09-26 10:04:57 -07:00
Isotr0py
d4d9899860
[Quantization] Add field to skip unquantized modules for GPTQ config (#25455)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-26 15:47:41 +00:00
Chih-Chieh Yang
2b6b1d7809
[Model] Mamba2 varlen refactor (#21467)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>
2025-09-26 11:31:14 +00:00
Sage Moore
dfb9af2014
[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk (#25698)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-26 01:25:28 -07:00
Tao He
99b3a504c5
[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. (#25743)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
2025-09-26 01:18:58 -07:00
xaguilar-amd
52621c8f5c
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300X (#25703)
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com>
2025-09-26 01:18:20 -07:00
Eugene Khvedchenya
392edee34a
EVS Support (Video tokens pruning) (#22980)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-26 11:54:54 +08:00
Aleksandr Malyshev
53a30845be
Llamas 3.1 405B fp4 changes upstreaming from 355_wip (#25135)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
2025-09-25 19:16:53 -06:00
Wentao Ye
9fe4c2bdb9
[Refactor] Remove DeepGEMM OP Register (#25710)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-25 20:13:41 -04:00
Shu Wang
081b5594a2
Fix routing_bias dtype (#25711)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
2025-09-25 23:35:14 +00:00
tomeras91
57329a8c01
[Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 (#25708)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-09-25 16:10:29 -07:00
Cyrus Leung
89fa54e6f7
[Optimization] Use a cheaper cache key in get_model_architecture (#25682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-25 17:54:20 -04:00
Matthew Bonanni
3468f17ebe
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-25 17:37:50 +00:00
Cyrus Leung
0ea80c87d9
[Model] Define merge_by_field_config MM interface (#25676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-25 17:13:07 +00:00
Michael Goin
916bd9204d
Revert "[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super()" (#25681)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-25 09:45:06 -07:00
Isotr0py
03858e6d1c
[Bugfix] Fix InternS1 video processing after Transformers v4.56 (#25644)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-25 14:46:04 +00:00
Cyrus Leung
12c1287d64
[mypy] Further improve MM type annotations (#25654)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-25 10:57:36 +00:00
Isotr0py
17b4c6685c
[Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video profiling (#25648)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-25 18:36:01 +08:00
Roger Wang
7be9ffcd9f
[Misc] Fix Qwen3-VL video_grid_thw typing (#25646)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-25 10:16:45 +00:00
Tyler Michael Smith
1260180c67
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-25 08:05:21 +00:00
Jacob Kahn
bc092ea873
Map CwmForCausalLM to llama and LlamaForCausalLM (#25611)
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-25 07:37:03 +00:00
Cyrus Leung
755ed7b05b
[Misc] Simplify PoolerOutput and move to v1/outputs (#25629)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-25 06:47:03 +00:00
XuruiYang
845adb3ec6
[Model] Add LongCat-Flash (#23991)
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
2025-09-24 21:53:40 -07:00
Saman A. Pour
90b139cfff
Enable Fbgemm NVFP4 on Dense models (#25609)
Signed-off-by: Saman Keon <samanamp@outlook.com>
2025-09-24 21:12:53 -07:00
Wentao Ye
4492e3a554
[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() (#25613)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-24 18:52:52 -07:00
Wei Wei
05c19485a5
[Kernel] Support DCP for Triton backend (#25132)
Signed-off-by: Wei Wei <wwei6@meta.com>
2025-09-24 18:09:34 -07:00
Jee Jee Li
52d0cb8458
[Model] Improve DotsOCRForCausalLM (#25466)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-25 07:58:08 +08:00
Wentao Ye
1f29141258
[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-24 18:52:36 -04:00
Duncan Moss
6160ba4151
feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel (#25503)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
2025-09-24 18:50:04 -04:00
Harry Mellor
8c853050e7
[Docs] Enable fail_on_warning for the docs build in CI (#25580)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-24 19:30:33 +00:00
Shu Wang
54e42b72db
Support mnnvl all2allv from Flashinfer (#21003)
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-24 14:38:16 -04:00
Cyrus Leung
9313be5017
[Misc] Improve type annotations for jsontree (#25577)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-24 22:49:58 +08:00
Woosuk Kwon
2e19a848d4
[V0 Deprecation] Remove max_seq_len_to_capture (#25543)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-24 01:51:39 -07:00
Cyrus Leung
6488f3481b
[Misc]] Move processing context to multimodal directory (#25548)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-24 08:15:00 +00:00
Isotr0py
27ec3c78f3
[CI/Build] Fix v1 OOT registration test (#25547)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-24 08:03:13 +00:00
Li, Jiang
1cbcfb94de
[Bugfix][CPU] Skip unsupported custom op register on CPU (#25534)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-24 06:21:51 +00:00
Corey Lowman
d747c2ef18
[Perf] Fix jit compiles at runtime of fla gated delta rule (#25432)
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-24 11:16:13 +08:00
Yong Hoon Shin
77d906995c
[KV sharing] Re-land Gemma3n model changes from #22628 (#24357)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-09-23 19:25:34 -07:00
Nikhil Gupta
359d293006
[fix]: add Arm 4bit fused moe support (#23809)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
2025-09-24 01:32:22 +00:00
Kyle Sayers
de94289a98
[Core] Support weight_loader_v2 for UnquantizedLinearMethod (#23036)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-09-23 18:30:26 -06:00
Wentao Ye
88d7bdbd23
[Bug] Fix AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv' (#25519)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-24 00:07:51 +00:00
Chenxi Yang
0d235b874a
Add CUTLASS FP8 MOE benchmark scripts and kernel config (#25302)
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
2025-09-23 18:07:42 -06:00
Juan Villamizar
bde2a1a8a4
[ROCm] Small functional changes for gptoss (#25201)
Signed-off-by: jpvillam <jpvillam@amd.com>
Co-authored-by: jpvillam <jpvillam@amd.com>
2025-09-23 23:39:50 +00:00
Thomas Parnell
5e25b12236
[Kernel] [Mamba] Remove BLOCK_H=1 from list of tuneable configurations for _chunk_cumsum_fwd_kernel (#25197)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
2025-09-23 23:23:30 +00:00
Michael Goin
7361ab379f
Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-23 22:48:40 +00:00