3604 Commits

Author SHA1 Message Date
Varun Sundar Rabindranath
3137991f55
[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor (#29162)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-21 14:28:17 -08:00
Julien Denize
57430fc95c
Default model load/config/tokenizer to mistral format if relevant files exist (#28659)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-21 13:58:59 -08:00
Wentao Ye
1f400c58b8
[CI] Add batch invariant test to ci (#27842)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-21 09:20:33 -07:00
rasmith
711241c13c
[CI/Build] Fix illegal memory access and unsupported test in kernels/attention/test_cache.py (#29118)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-21 10:58:38 -05:00
Cyrus Leung
aab0102a26
[V0 deprecation] Remove more V0 references (#29088)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:56:59 +00:00
WeiQing Chen
b34129bf8e
[Misc] remove useless v1 env (#29164)
Signed-off-by: David Chen <530634352@qq.com>
2025-11-21 01:41:20 -08:00
Alex Brooks
b4734b9550
[Bugfix] Fix default MM LoRA alignment for single str prompts (#29140)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-11-21 13:32:30 +08:00
Jialin Ouyang
30b9c67743
Revert "[Redo] #26368 (#28771)" (#29121)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-20 21:27:45 -08:00
Cyrus Leung
56e96b37e4
[V0 Deprecation] Remove best_of (#29090)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:40:40 +08:00
jeremyteboul
0730414999
[Core] Add audio_embeds support to chat completions (#29059)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-11-21 11:39:47 +08:00
Jee Jee Li
9875be6431
[LoRA][2/2]Remove LoRA extra vocab (#28545)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-21 09:46:43 +08:00
Michael Goin
87cbbdff63
Update model references for OLMo3 (#29099)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-21 09:16:52 +08:00
rasmith
c7a29d2c8d
[CI/Build] Remove skip global cleanup in test_struct_output_generate.py (#29022)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 21:44:37 +00:00
rasmith
8237ab8a2b
[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now (#29021)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 21:35:14 +00:00
rasmith
5e5a7eb16f
[CI/Build] Make test_attention_selector.py run tests on correct platform (#29064)
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-20 20:45:56 +00:00
rasmith
3d84ef9054
[CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py (#29043)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 20:39:49 +00:00
Or Ozeri
647464719b
[KVConnector][Core] Support cross-layer KV blocks (#27743)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-11-20 19:09:59 +01:00
TJian
82b05b15e6
[BugFix] [FEAT] Enable fastsafetensors for ROCm platform (#28225)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-20 16:34:11 +00:00
Shinichi Hemmi
c9e093116c
[MODEL] Implement plamo3 (#28834)
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
2025-11-20 03:00:19 -08:00
Or Ozeri
c0c2dd1e0b
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 18:55:10 +08:00
Vensen
fb8851f254
[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu (#28760)
Signed-off-by: vensen <vensenmu@gmail.com>
Signed-off-by: Vensenmu <vensenmu@gmail.com>
2025-11-20 02:52:02 -08:00
rasmith
322cb02872
[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 17:48:09 +08:00
Wentao Ye
2c52c7fd9a
[Bug] Fix torch dynamo warning Dynamo detected a call to a functools.lru_cache (#29038)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-20 16:52:23 +08:00
Bradley D
1e1c06789e
[ci][amd] fix EPLB execution test (#28742)
Signed-off-by: Bradley Davis <bradleyhd@meta.com>
2025-11-20 14:53:38 +07:00
Lukas Geiger
a9705a290a
[Model][QwenVL] Replace torch.repeat_interleave with faster np.repeat (#28964)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-19 22:04:23 -08:00
Canlin Guo
fe25772aa9
[Bugfix] Handle broken frames in video loading (#29001)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: 凌葭 <lvjiang.lj@alibaba-inc.com>
Co-authored-by: 凌葭 <lvjiang.lj@alibaba-inc.com>
2025-11-20 04:38:12 +00:00
Benjamin Chislett
fcbcba6c70
[Feat] Iteration-level profiling for Torch and CUDA profiler (#28987)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-19 19:17:48 -08:00
liangel-02
1d642872a2
[torchao] fix safetensors for sharding (#28169)
Signed-off-by: Angel Li <liangel@meta.com>
2025-11-19 16:39:45 -08:00
Nick Hill
9ccef8e333
[Misc] Colorize logs (#29017)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-19 19:26:04 -05:00
Alexander Matveev
3aaa94ac99
[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier (#28687)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-19 15:47:13 -08:00
Micah Williamson
22e44ad589
[ROCm][CI] Fix Weight Loading With Multiple GPU Tests on ROCm (#28984)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-11-19 21:31:33 +00:00
Shu Wang
613abb50d5
[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-19 13:29:06 -08:00
Wentao Ye
1607e664f0
[Bug] Fix Batch Invariant MLA test (#28967)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-19 21:18:32 +00:00
Ryan Rock
68d7231991
[CI/Build] Fix test_prefix_prefill for AMD (#28905)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2025-11-19 16:04:36 -05:00
Qiu
2fd893b4ce
[Feature] Prefill Context Parallel (PCP) basic support (#28718)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
2025-11-19 15:52:44 -05:00
Copilot
61728cd1df
Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests (#28966)
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 13:32:19 -05:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
vnadathur
1ffe934c8a
[torch.compile] caching of config fields should be opt-out by default (#26468)
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 06:13:54 -08:00
Yanan Cao
2c8b9182b5
[CI] Reorganize compile tests so new tests are automatically included in CI (#28625)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-19 06:13:50 -08:00
Didier Durand
09540cd918
[Doc]: fix typos in various files (#29010)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-19 04:56:21 -08:00
gnovack
d69062c67a
add support for --fully-sharded-loras in fused_moe (#28761)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-19 16:32:00 +08:00
Roman Solomatin
71d0ae1c54
[Misc] Update embedding/cross encoder tests to use mteb v2 (#27329)
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-11-18 22:28:40 -08:00
Matthew Bonanni
4c23690f43
[Attention] FlashAttention ViT support, make default backend (#28763)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-18 20:06:21 -08:00
Strahinja Stamenkovic
814843e021
Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307)
Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>
2025-11-19 03:12:31 +00:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Jialin Ouyang
40b6b38f2c
[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-11-19 02:10:02 +00:00
Kunshang Ji
2a2d5d2780
Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-18 11:34:36 -08:00
Chendi.Xue
c3e2978620
[NIXL] fix cpu PD after physical <> logical block_size PR (#28904)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-11-18 14:03:23 -05:00
Kevin H. Luu
c64c0b78de
[chore] Move the rest of wikimedia url to S3 (#28921)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 09:44:18 -08:00
Nicolò Lucchesi
f226a3f0c1
[CI][NIXL] Change default block_size for tests (#28927)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-18 09:22:30 -08:00