Benji Beck
|
a70d0bd0a3
|
Migrate LlavaOnevisionMultiInputs to TensorSchema (#21844)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-19 17:02:02 +00:00 |
|
yiz-liu
|
4f510bc2a1
|
[Model] Removes redundant all-reduce operation in Qwen3MoeSparseMoeBlock (#23169)
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-08-19 16:18:41 +00:00 |
|
TJian
|
1298c67795
|
[FEAT] [Performance] Enable DP for ViT in Qwen2.5VL (#22742)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-19 15:25:57 +00:00 |
|
myselvess
|
b87cb97a53
|
[Model] support new model ovis2.5 (#23084)
Signed-off-by: myselvess <244285088@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 13:12:59 +00:00 |
|
qizixi
|
5bfe0dea7a
|
[bug fix] Fix llama4 spec decoding (#22691)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-08-19 08:53:24 +00:00 |
|
Isotr0py
|
31fd3265c8
|
[Bugfix] Fix broken Minimax-01-VL model (#22116)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 08:49:29 +00:00 |
|
qizixi
|
4efd43e9b4
|
Fix GLM-4.5V-FP8 numerical issue (#22949)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 07:56:31 +00:00 |
|
Jiangyun Zhu
|
fda9537c5e
|
[Model] Support Pipeline Parallelism for moonshotai/Kimi-VL-A3B-Thinking-2506 (#23114)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 14:24:31 +08:00 |
|
Benji Beck
|
e75f342261
|
Migrate InternVLImagePixelInputs (in nemotron_vl.py) to TensorSchema (#22023)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 13:48:26 +08:00 |
|
Raushan Turganbay
|
0e3bb543f0
|
[Bugfix] Support compile for Transformers multimodal (#23095)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-08-18 13:35:48 +00:00 |
|
Cyrus Leung
|
d3f71f1224
|
[Refactor] Get prompt updates earlier (#23097)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-18 12:31:53 +00:00 |
|
Cyrus Leung
|
27e8d1ea3e
|
[Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs (#23053)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-18 09:52:00 +00:00 |
|
double7
|
9f1c642254
|
[Bugfix] fix Qwen2.5-Omni processor output mapping (#23058)
Signed-off-by: double7 <33449816+DoubleVII@users.noreply.github.com>
Co-authored-by: 杨森 <yangsen.double7@bytedance.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-17 22:09:11 -07:00 |
|
Woosuk Kwon
|
c55bc1db26
|
[Misc] Remove dead return (#23061)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-17 10:36:46 -07:00 |
|
947132885
|
fe0411fc6f
|
[Bugfix] should use stack instead of concat (#22972)
Signed-off-by: 947132885 <947132885@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-17 08:46:36 +00:00 |
|
Jee Jee Li
|
4d4061b6e7
|
[Kernel] Add cuda kernel for gpt_oss activation (#22951)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-17 05:03:24 +00:00 |
|
Cyrus Leung
|
5c32143b9d
|
[Refactor] Defer tensor data construction in MultiModalKwargs (#23030)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-16 21:05:50 -07:00 |
|
汪志鹏
|
829bbd7882
|
[New Model]mBART model (#22883)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-08-16 12:16:58 +00:00 |
|
Isotr0py
|
cc826a202b
|
[Multimodal] Update Tensor schema test to cover arbitrary shape mm inputs (#22867)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-16 00:44:50 -07:00 |
|
Benjamin Chislett
|
fbd88728b3
|
[Bugfix] Fix DeepSeek MTP (#22934)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-08-16 01:25:06 +00:00 |
|
Thomas Parnell
|
f5d412bafb
|
[BugFix] Fix regression caused by mamba state dtype PR (#22998)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-15 22:55:26 +00:00 |
|
Chih-Chieh Yang
|
6cd69f51bf
|
[Model] Granite-4 support loading quantized checkpoint (#22925)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
|
2025-08-15 18:47:56 +00:00 |
|
Thomas Parnell
|
75531a6c13
|
[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com>
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-15 12:57:06 +00:00 |
|
Simon Mo
|
f1f0d2fab8
|
Revert "[Kernel] Add cuda kernel for gpt_oss activation" (#22948)
|
2025-08-14 17:38:10 -07:00 |
|
Jee Jee Li
|
81f4b96481
|
[Kernel] Add cuda kernel for gpt_oss activation (#22538)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-14 17:21:29 -07:00 |
|
Jee Jee Li
|
92ff41abea
|
[Model] Modify the gate implementation of glm4_moe (#22832)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-14 05:28:50 -07:00 |
|
Isotr0py
|
7c3a0741c6
|
[Bugfix] Fix PixtralHFImagePixelInputs dynamic shape check (#22827)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-14 02:35:43 -07:00 |
|
Cyrus Leung
|
c9232d41f4
|
[CI/Build] Update VLM common tests (#22841)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-13 10:03:05 -07:00 |
|
HWH
|
9bd9294f0e
|
[Bugfix] Fix MiniCPMV Image input inference failed (#22813)
Signed-off-by: HWH <67449739+jio-H@users.noreply.github.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-13 09:41:41 -07:00 |
|
Gh0u1L5
|
b159c0a67a
|
Fix GGUF loader for Qwen3 MoE. (#22785)
Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>
|
2025-08-13 06:08:23 -07:00 |
|
Yuanyuan Chen
|
6772bb0f7d
|
Remove unnecessary CUDA sync of qwen image and video preprocess (#22792)
Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-13 06:07:28 -07:00 |
|
Duc-Viet Hoang
|
a01e0018b5
|
[Bugfix] Fix Nemotron VL image processing (#22739)
Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
|
2025-08-13 03:11:36 -07:00 |
|
Yuxuan Zhang
|
9e7e5baaa8
|
[Model] Add missing prefix to glm4_1v (#22716)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-08-13 01:23:33 -07:00 |
|
zzh142857
|
d16aa3dae4
|
[Model] Add option to run Step3VisionEncoder in DP (#22697)
Signed-off-by: zzh142857 <chaorenzhaozhenghao@gmail.com>
|
2025-08-13 00:09:13 -07:00 |
|
Michael Goin
|
c6b928798e
|
Force TRTLLM attention for gpt-oss on SM100 (#22678)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-12 21:22:16 -07:00 |
|
Po-Han Huang (NVIDIA)
|
4f0f844b16
|
Fix cuda illegal mem access with Llama4 TP8 + rms_norm custom op (#22701)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-08-12 21:21:50 -07:00 |
|
Jee Jee Li
|
fde0b611a3
|
[Model] Decouple glm4v (#22751)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-12 17:13:17 -07:00 |
|
Harry Mellor
|
d0a6301588
|
Fix Transformers backend tensor parallel for multimodal models (#22673)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 17:12:30 -07:00 |
|
Rahul Tuli
|
5a4b4b3729
|
Add: SupportsEagle3 interface for explicit EAGLE3 support (#22642)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
2025-08-12 09:24:52 -07:00 |
|
wang.yuqi
|
f7ad6a1eb3
|
[CI Failure] fix tests/entrypoints/openai/test_skip_tokenizer.py (#22708)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-12 05:42:58 -07:00 |
|
Harry Mellor
|
80bb1e8afe
|
Officially support SmolLM3 using the Transformers backend (#22665)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 05:38:48 -07:00 |
|
dongluw
|
9f909b8996
|
[New Model] Support Command-A-Vision (#22660)
Signed-off-by: donglu <donglu@cohere.com>
|
2025-08-12 01:39:54 -07:00 |
|
wang.yuqi
|
6d729c43fb
|
[Bugfix] Fix ModernBert load & Enable sliding window attention for bidirectional attention. (#22637)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-12 00:23:17 -07:00 |
|
Benji Beck
|
4678503476
|
Migrate MiniCPMVImageInputs to TensorSchema (#21939)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-11 20:43:37 -07:00 |
|
Andy Chen
|
9b94d6ec8f
|
Enable 4bit bnb prequant MOE (#21548)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-11 19:02:14 -07:00 |
|
Harry Mellor
|
458e74eb90
|
Support more parallel styles in Transformers backend TP (#22651)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-11 10:42:48 -07:00 |
|
22quinn
|
807d21b80d
|
[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-11 10:31:36 -07:00 |
|
wang.yuqi
|
84cf78acee
|
[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-11 09:41:37 -07:00 |
|
danielafrimi
|
14a5d903ab
|
[Model] NemotronH Support (#22349)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-08-11 04:09:24 -07:00 |
|
Cyrus Leung
|
951b038298
|
[Misc] Move jsontree to utils (#22622)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-11 03:49:32 -07:00 |
|