Wentao Ye
541a2ef892
[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. ( #29546 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 20:31:14 +08:00
Jinzhen Lin
879ddb09c3
[Kernel][MoE] optimize moe_align_block_size ( #29642 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-07 01:58:47 -08:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )" ( #30199 )
2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Cyrus Leung
671427efbf
[Model] Move multimodal_cpu_fields definition to field config ( #30181 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 13:40:02 +00:00
Cyrus Leung
c46b932df2
[Chore] Deprecate SupportsMultiModal.merge_by_field_config ( #30170 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 07:57:28 +00:00
Peter Salas
e858bc4d14
[Model] Add support for transformer-based Ultravox v0.7 projector ( #30089 )
...
Signed-off-by: Peter Salas <peter@fixie.ai>
2025-12-05 20:55:43 -08:00
Dongjie Zou
e3fbb6f152
fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb ( #30093 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
2025-12-05 20:55:09 -08:00
yuttian1
c4d62618ca
Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend ( #30102 )
...
Signed-off-by: yuttian1 <yuttian@amd.com>
2025-12-05 20:54:38 -08:00
rasmith
dc839ad03d
[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round ( #30151 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-05 20:52:11 -08:00
Wentao Ye
7b5575fa7d
[Bug] Fix vLLM config is not set error ( #29999 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-05 16:42:12 -05:00
Divakar Verma
962d703818
[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute ( #29926 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-05 19:57:26 +00:00
Matthew Bonanni
66e674cdd5
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments ( #26315 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-12-05 09:48:43 -08:00
Yi Liu
0d8a7d8a26
[Compressed Tensors] Add XPU wNa16 support ( #29484 )
...
Signed-off-by: yiliu30 <yi4.liu@intel.com>
2025-12-05 22:02:09 +08:00
Zhiwei
3628bcaaf2
[ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe ( #29775 )
...
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>
2025-12-05 11:01:16 +00:00
amitz-nv
6038b1b04b
[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH ( #29978 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
2025-12-05 00:34:33 -08:00
Alexander Matveev
4470ee2f90
[Perf] Enable separate shared_experts stream only for CUDA ( #30085 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-12-05 00:03:17 +00:00
Harry Mellor
e10c84e06a
Access partial_rotary_factor from rope_parameters ( #29966 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 18:42:49 +00:00
Jee Jee Li
652ba93da3
[Bugfix] Fix FP8 MoE LoRA ( #29890 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-04 18:17:49 +00:00
Tao Yun
6dcb07f676
support qwen3-vl handle requests with embeddings ( #30037 )
...
Signed-off-by: taoyun <1069423820@qq.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-04 17:34:06 +00:00
Cyrus Leung
b286a311c2
[Chore] Deprecate merge_by_field_config arg ( #30035 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 17:21:24 +00:00
Harry Mellor
9998ea5b57
Delete HF version of Phi 4 MM ( #30049 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 13:44:50 +00:00
wang.yuqi
74c4d80c6c
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling ( #27145 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
Cyrus Leung
68eb5c8d97
[Misc] Move functions into PoolingMetadata ( #30027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 08:21:19 +00:00
TJian
3f1b03739a
[ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni ( #29974 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-04 08:20:24 +00:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing ( #29970 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
bnellnm
2902c34826
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton ( #29929 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 20:49:00 +00:00
Varun Sundar Rabindranath
19bee6d12d
[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel ( #29470 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 18:04:59 +00:00
HDCharles
b294e28db2
[refactor] CTMoEMethods to use QuantizationArgs ( #28871 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 11:00:56 +00:00
Tsukasa OI
42c1949643
[Bugfix][Quantization] Support BF16 tensors on GGUF ( #29948 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-12-03 10:33:46 +00:00
Isotr0py
a21cd9ed23
[Bugfix] Fix incorrect image_grid_thw rank for HunyuanOCR from missing merge_by_field_config=True ( #29950 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 10:05:10 +00:00
Julien Denize
5e5646e206
[BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention ( #29908 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2025-12-02 14:51:20 -08:00
Harry Mellor
6fc5841db1
Fix some more Transformers nightly tests ( #29872 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 21:49:44 +00:00
Navanit Dubey
a2b053dc85
feat(model): Add BitsAndBytes quantization support for Qwen3-Omni-MoE ( #29896 )
...
Signed-off-by: navanit-git <navanitdubey@gmail.com>
2025-12-02 19:28:35 +00:00
Matthew Bonanni
1d93f11675
[Attention][CUDAGraph] Remove CG padding from attention backends ( #29352 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-02 13:48:08 -05:00
Isotr0py
0ec8422171
[Bugfix] Fix incorrect channel order for idefics3 in edge case ( #29881 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-02 16:03:52 +00:00
Matthew Bonanni
51c57b51dd
[Bugfix] Fix DeepSeek R1 MTP weight loading ( #29545 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
2025-12-02 15:52:18 +00:00
Cyrus Leung
68ffbca7e4
[Chore] Use tokenizer.encode and tokenizer.decode directly ( #29851 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-02 12:30:40 +00:00
Julien Denize
d8c6210eea
Add Mistral Large 3 and Ministral 3 ( #29757 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Mickael Seznec <mickael@mistral.ai>
2025-12-02 10:29:00 +00:00
Harry Mellor
f5b0846ba0
Fix some Transformers nightly tests ( #29802 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 07:05:27 +00:00
Cyrus Leung
653591d5e7
[Chore] Move tokenizer initialization methods ( #29793 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-02 13:33:37 +08:00
Johnny Yang
f441d36cee
Add missing return in _check_vllm_model_embed_input_ids ( #29834 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-12-01 19:22:50 -08:00
Divakar Verma
4b40924998
[ROCm] Fallback pytorch GELU with tanh approximation to GELU() ( #29244 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Signed-off-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-02 02:02:22 +00:00
sangbumlikeagod
092bb73b8a
[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation ( #24209 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
2025-12-01 18:19:17 +01:00
Fanli Lin
f37e8938d2
[XPU] Fix AWQ skipped layer detection in IPEX quantization ( #29774 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-12-01 12:00:52 +00:00
Shu Wang
f72a817bdf
[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch ( #27141 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-30 16:05:32 -08:00
Xingyu Liu
21c2627934
[Misc]Remove redundant hidden_size property in ModelConfig ( #29749 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-30 17:14:23 +00:00
Omer Ullman Argov
39d28108f4
[Feat] Support non-gated activations in NVFP4 modelopt path ( #29004 )
2025-11-30 11:02:40 -05:00
Cyrus Leung
64bc09ba27
[Core] Enable inputs_embeds_size separate from hidden_size ( #29741 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-30 17:31:12 +08:00
Isotr0py
47539cfd3e
[Bugfix] Fix mismatched nvfp4 gemm output shape ( #29742 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-30 09:15:01 +00:00