Reza Barazesh
37efc63b64
[V0 deprecation] Guided decoding ( #21347 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 03:15:30 -07:00
Isotr0py
a4528f0cac
[Model]: Fused MoE for nomic-embed-text-v2-moe ( #18321 )
...
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-29 03:13:27 -07:00
Benji Beck
f1e2c095ec
Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema ( #21684 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-28 22:09:45 -07:00
Wentao Ye
48b763d6b5
[Refactor] Merge Compressed Tensor FP8 CompressedTensorsW8A8Fp8MoEMethod and CompressedTensorsW8A8Fp8MoECutlassMethod ( #21775 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-28 19:47:21 -06:00
Nikhil Gupta
89ac266b26
[Feat]: Add support for Dynamic Quant 4 bit CPU kleidiai kernels ( #17112 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-28 20:55:15 +00:00
rasmith
b361f14e39
[AMD][BugFix] Fix omission of wvSplitK kernel for small batch sizes (1-4) due to torch.compile ( #21350 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2025-07-28 15:38:20 -04:00
Cyrus Leung
e17a4d3bf9
[Bugfix] Fix granite speech shape validation ( #21762 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 14:19:21 -04:00
Anton Vlasjuk
656c24f1b5
[Ernie 4.5] Name Change for Base 0.3B Model ( #21735 )
...
Signed-off-by: vasqu <antonprogamer@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 12:22:32 +00:00
Isotr0py
0ae970ed15
[Bugfix] Fix glm4.1v video_grid_thw tensor shape scheme ( #21744 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-28 04:26:49 -07:00
Jee Jee Li
1b769dccf3
[Bugfix] Fix Ernie4_5_MoeForCausalLM shared experts ( #21717 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-28 11:02:25 +00:00
Cyrus Leung
a4ed731546
[Model] Prioritize Transformers fallback over suffix matching ( #21719 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 02:15:31 -07:00
Benji Beck
d128d0d554
Migrate KeyeImageInputs and KeyeVideoInputs to TensorSchema ( #21686 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-28 01:16:35 -07:00
Asaf Joseph Gardin
a6c050286a
[v1][mamba] Added mamba_type into MambaSpec ( #21715 )
...
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
2025-07-28 08:15:55 +00:00
Cyrus Leung
139a97ec56
[Bugfix] Fix shape checking for Fuyu ( #21709 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 00:05:56 -07:00
Benji Beck
3ea57a56d9
Migrate Idefics3ImagePixelInputs and Idefics3ImageEmbeddingInputs to … ( #21683 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-27 22:37:23 -07:00
Benji Beck
75856bc2cb
Migrate GraniteSpeechAudioInputs to TensorSchema ( #21682 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-07-27 22:37:20 -07:00
Benji Beck
304dcdf575
Migrate GLMVImagePixelInputs to TensorSchema ( #21679 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-27 22:36:11 -07:00
Benji Beck
88e46c7c8d
Migrate Glm4vImageInputs, Glm4vVideoInputs to TensorSchema ( #21678 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-07-27 22:36:08 -07:00
Benji Beck
d8937de4c8
Migrate Gemma3ImagePixelInputs to TensorSchema ( #21676 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-27 22:36:05 -07:00
TJian
e626d286f5
[FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel ( #21242 )
2025-07-28 05:07:06 +00:00
Shinichi Hemmi
c7ffe93d9c
[Model] Support TP/PP/mamba2 kernel for PLaMo2 ( #19674 )
...
Signed-off-by: Shinichi Hemmi <shemmi@preferred.jp>
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
Co-authored-by: Calvin Metzger <metzger@preferred.jp>
Co-authored-by: Sixue Wang <cecilwang@preferred.jp>
2025-07-28 05:00:47 +00:00
Jee Jee Li
04ff4be310
[Misc] Add fused_moe configs for Qwen3-Coder-480B-A35B-Instruct-FP8 ( #21700 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-27 20:12:18 -07:00
Cyrus Leung
86ae693f20
[Deprecation][2/N] Replace --task with --runner and --convert ( #21470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-27 19:42:40 -07:00
Caleb_Du
57c22e57f9
Fix CUDA permute/unpermute for use with DeepGemm Moe ( #17934 )
...
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
2025-07-27 07:08:00 -07:00
Wentao Ye
bda9d0535f
[Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor ( #21631 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-27 05:25:21 -07:00
Isotr0py
3d847a3125
[VLM] Add video support for Intern-S1 ( #21671 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-27 11:49:43 +00:00
Benji Beck
5f8c9a425e
Migrate Florence2ImagePixelInputs to TensorSchema ( #21663 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-27 02:43:02 -07:00
Isotr0py
eed2f463b2
[VLM] Support HF format Phi-4-MM model ( #17121 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-26 20:07:57 -07:00
Benji Beck
20950b29fb
Migrate ChameleonImagePixelInputs to TensorSchema ( #21657 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-26 19:34:25 -07:00
Benji Beck
3339cba3ff
Migrate FuyuImagePatchInputs to TensorSchema ( #21662 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-26 19:34:14 -07:00
Benji Beck
0b8caf9095
Migrate DeepseekVL2ImageInputs to TensorSchema ( #21658 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-26 19:34:11 -07:00
Benji Beck
ccf27cc4d4
Migrate Blip2ImagePixelInputs and Blip2ImageEmbeddingInputs to TensorSchema ( #21656 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-27 10:33:52 +08:00
Jinzhen Lin
c657369841
support torch.compile for bailing moe ( #21664 )
2025-07-26 23:54:32 +00:00
Wenchen Lo
6c66f28fa5
Remove xformers requirement for Mistral-format Pixtral and Mistral3 ( #21154 )
...
Signed-off-by: Wenchen Lo <charles761013@gmail.com>
2025-07-26 17:20:29 -06:00
Kaixi Hou
de509ae8eb
[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels ( #21411 )
...
Signed-off-by: kaixih <kaixih@nvidia.com>
2025-07-26 07:10:36 -07:00
Wentao Ye
56e544f24b
[Refactor] Remove moe_align_block_size_triton ( #21335 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-26 07:08:29 -07:00
Maximilien de Bayser
1cd6eaba54
Support encoder-only models without KV-Cache ( #21270 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-07-26 21:09:52 +08:00
Benji Beck
de10ff0b7c
Migrate AyaVisionImagePixelInputs to TensorSchema for shape validation ( #21622 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-26 06:08:18 -07:00
Benji Beck
9d197280fa
Migrate AriaImagePixelInputs to TensorSchema for shape validation ( #21620 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-26 06:08:15 -07:00
Lyu Han
875af38e01
Support Intern-S1 ( #21628 )
...
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-26 19:14:04 +08:00
Farzad Abdolhosseini
62965de5fe
[Model] Ultravox: Support Llama 4 and Gemma 3 backends ( #17818 )
...
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
Signed-off-by: Patrick Li <patrick8289@gmail.com>
Co-authored-by: Patrick Li <patrick8289@gmail.com>
2025-07-25 18:12:31 -07:00
Alex Kogan
7ae75fa6d0
[Feature] Add support for MoE models in the calibration-free RTN-based quantization ( #20766 )
...
Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
2025-07-25 18:09:34 -07:00
Wentao Ye
75d29cf4e1
[Perf] Cuda Kernel for Int8 Per Token Group Quant ( #21476 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-25 17:07:07 -07:00
mgazz
e189b50f53
Add support for Prithvi in Online serving mode ( #21518 )
...
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-25 07:01:27 -07:00
Chih-Chieh Yang
eab2f3980c
[Model] Replace Mamba2 RMSNorm Gated with Fused Triton Kernel ( #20839 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: Yu Chin Fabian Lim <fabian.lim@gmail.com>
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <fabian.lim@gmail.com>
2025-07-25 06:49:36 -07:00
bigshanedogg
29c6fbe58c
[MODEL] New model support for naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B ( #20931 )
...
Signed-off-by: bigshanedogg <bigshane319@gmail.com>
2025-07-25 06:05:42 -07:00
xyxinyang
c72f049cb4
[Model] Fix Ernie4.5MoE e_score_correction_bias parameter ( #21586 )
...
Signed-off-by: zhouchong <zhouchong03@baidu.com>
Co-authored-by: zhouchong <zhouchong03@baidu.com>
2025-07-25 06:02:53 -07:00
Cyrus Leung
46d81d6951
[V1] Get supported tasks from model runner instead of model config ( #21585 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-25 05:36:45 -07:00
Jee Jee Li
5c3f2628d5
[Quantization] Enable BNB support for more MoE models ( #21370 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-25 03:57:34 -07:00
Xu Wenqing
8ed01e32f7
Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct ( #21598 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-07-25 02:36:55 -07:00