Avshalom Manevich
|
a0f8a79646
|
[fix] fix qwen image_embeds input (#21049)
Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>
|
2025-07-16 15:17:20 +00:00 |
|
Cyrus Leung
|
1c3198b6c4
|
[Model] Consolidate pooler implementations (#20927)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-16 13:39:13 +00:00 |
|
zhiweiz
|
c11013db8b
|
[Meta] Llama4 EAGLE Support (#20591)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: qizixi <qizixi@meta.com>
|
2025-07-15 21:14:15 -07:00 |
|
Peter Pan
|
1eb2b9c102
|
[CI] update typos config for CI pre-commit and fix some spells (#20919)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-15 21:12:40 -07:00 |
|
Wentao Ye
|
76ddeff293
|
[Doc] Remove duplicate docstring (#21012)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-15 20:09:13 -07:00 |
|
Michael Goin
|
f46098335b
|
[Bugfix] Fix Mistral3 support on SM100/SM120 (#20998)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 20:08:41 -07:00 |
|
Ming Yang
|
fcb9f879c1
|
[Bugfix] Correct per_act_token in CompressedTensorsW8A8Fp8MoECutlassM… (#20937)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-15 19:53:42 -07:00 |
|
Brayden Zhong
|
75a99b98bf
|
[Chore] Remove outdated transformers check (#20989)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-07-15 19:42:40 -07:00 |
|
Thomas Parnell
|
6cbc4d4bea
|
[Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture (#20923)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-15 19:19:10 -07:00 |
|
Tuan, Hoang-Trong
|
f29fd8a7f8
|
[BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 (#20838)
Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
|
2025-07-15 16:08:26 -04:00 |
|
Patrick von Platen
|
e7e3e6d263
|
Voxtral (#20970)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-15 07:35:30 -07:00 |
|
Thomas Parnell
|
3534c39a20
|
[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-15 04:04:35 -07:00 |
|
Jennifer He
|
85bd6599e4
|
[Model] Add AutoWeightsLoader support for BERT, RoBERTa (#20534)
Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
|
2025-07-15 13:34:24 +08:00 |
|
Ruheena Suhani Shaik
|
016b8d1b7f
|
Enabled BnB NF4 inference on Gaudi (#20172)
Signed-off-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
|
2025-07-14 20:26:08 -07:00 |
|
XiongfeiWei
|
d4170fad39
|
Use w8a8 quantized matmul Pallas kernel (#19170)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-07-15 03:06:33 +00:00 |
|
Michael Goin
|
bcdfb2a330
|
[Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM (#20933)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 01:42:17 +00:00 |
|
Thomas Parnell
|
86f3ac21ce
|
Fix overflow indexing in causal_conv1d kernel (#20938)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-14 21:43:07 +00:00 |
|
Varun Sundar Rabindranath
|
c0569dbc82
|
[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-14 19:47:16 +00:00 |
|
ant-yy
|
38efa28278
|
[Model] Add Ling implementation (#20680)
Signed-off-by: vito.yy <vito.yy@antgroup.com>
|
2025-07-14 22:10:32 +08:00 |
|
Jee Jee Li
|
a99b9f7dee
|
[Quantization] add BNB for MixtralForCausalLM (#20893)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-14 07:34:34 +00:00 |
|
TJian
|
80d38b8ac8
|
[V1] [ROCm] [AITER] Upgrade AITER to commit 916bf3c and bugfix APIs (#20880)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-07-13 15:19:32 +00:00 |
|
Minkyu Kim
|
bd4c1e6fdb
|
Support for LlamaForSequenceClassification (#20807)
Signed-off-by: thechaos16 <thechaos16@gmail.com>
|
2025-07-13 00:09:34 -07:00 |
|
Nicolò Lucchesi
|
020f58abcd
|
[Core] Support multiple tasks per model (#20771)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-12 19:40:11 -07:00 |
|
Wentao Ye
|
c1acd6d7d4
|
[Refactor] Change the way of import triton (#20774)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-12 19:39:55 -07:00 |
|
ElizaWszola
|
3b3b778d4a
|
[Bugfix] Fix a couple PPLX+CUTLASS MoE bugs (#20825)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
2025-07-12 19:39:14 -07:00 |
|
Wentao Ye
|
42d440c22b
|
[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-12 19:38:45 -07:00 |
|
Michael Goin
|
6e2c176e1f
|
[Bugfix] Restrict Machete to only run on Hopper (#20830)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-12 17:34:40 +00:00 |
|
Alex Brooks
|
c2a2f19aba
|
[Bugfix] Fix Tensor Parallelism Padding Consistency in Granite Models (#20843)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-07-12 06:11:30 -07:00 |
|
Congcong Chen
|
2c11a738b3
|
[Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702)
Signed-off-by: Congcong Chen <congcongchen@microsoft.com>
|
2025-07-12 06:02:10 -07:00 |
|
Zhiyu
|
4afe687a82
|
Enable ModelOpt Llama4 fp8 checkpoint deployment (#20419)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-07-11 23:07:16 -07:00 |
|
Wentao Ye
|
0d4891cd03
|
[Bug] Fix DeepGemm for EP low latency case (#20833)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-11 23:05:12 -07:00 |
|
Nicolò Lucchesi
|
3c7d942da8
|
[Frontend] Abstract prompt and SpeechToTextConfig for transcriptions models (#20637)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-11 21:33:26 -07:00 |
|
Varun Sundar Rabindranath
|
890323dc1b
|
[Bugfix] : Fix typo - logger.warn_once -> logger.warning_once (#20852)
|
2025-07-11 20:56:24 -07:00 |
|
Isotr0py
|
01cae37713
|
[CI/Build] Ensure compatability with Transformers v4.53 (#20541)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-11 20:53:07 -07:00 |
|
yurhett
|
11c0198615
|
[Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading (#20682)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-07-11 20:52:43 -07:00 |
|
Li, Jiang
|
b1235c3e10
|
[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices (#20822)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-11 20:52:05 -07:00 |
|
Jee Jee Li
|
44d02f54db
|
[Misc] Restrict deep_gemm's log output (#20827)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-11 20:50:42 -07:00 |
|
Varun Sundar Rabindranath
|
53fa457391
|
[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-11 07:51:46 -07:00 |
|
Jee Jee Li
|
8020e98c9f
|
[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-11 08:01:13 +00:00 |
|
nopperl
|
5d09152ff1
|
[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine (#20660)
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
|
2025-07-11 05:53:31 +00:00 |
|
Luka Govedič
|
31d5c1797f
|
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 04:56:28 +00:00 |
|
Wentao Ye
|
e2de455c34
|
[Feature] Integrate SM100 DeepGEMM support (#20087)
|
2025-07-10 20:18:05 -07:00 |
|
Michael Goin
|
922f316441
|
[Model] Support HF format of minimax (#20211)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 02:55:21 +00:00 |
|
Duncan Moss
|
5923ab9524
|
[fix]: disable cutlass block scaled group gemm for EP (#20781)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
|
2025-07-11 02:39:18 +00:00 |
|
Simon Mo
|
b854321ffe
|
[Docs] Lazy import gguf (#20785)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-07-10 16:06:37 -07:00 |
|
Varun Sundar Rabindranath
|
f0c98cae27
|
[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-10 14:40:38 -07:00 |
|
Varun Sundar Rabindranath
|
fdadb6f43a
|
[Bugfix] Fused MoE Modular Kernel chunking loop (#20392)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-10 20:31:10 +00:00 |
|
Ming Yang
|
3de2ed767f
|
[Bugfix] Remove assertion of expert_map being None (#20714)
Signed-off-by: Ming Yang <yming@meta.com>
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-10 19:55:22 +00:00 |
|
Nathan Hoos
|
d6902ce79f
|
[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975)
Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>
|
2025-07-10 15:30:26 -04:00 |
|
Sanger Steel
|
5e53c89a74
|
[Bugfix] [CI] Fix Tensorizer LoRA test (#20760)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-07-10 19:07:06 +00:00 |
|