Cyrus Leung
56e96b37e4
[V0 Deprecation] Remove best_of ( #29090 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:40:40 +08:00
jeremyteboul
0730414999
[Core] Add audio_embeds support to chat completions ( #29059 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-11-21 11:39:47 +08:00
Jee Jee Li
9875be6431
[LoRA][2/2]Remove LoRA extra vocab ( #28545 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-21 09:46:43 +08:00
Michael Goin
87cbbdff63
Update model references for OLMo3 ( #29099 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-21 09:16:52 +08:00
rasmith
c7a29d2c8d
[CI/Build] Remove skip global cleanup in test_struct_output_generate.py ( #29022 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 21:44:37 +00:00
rasmith
8237ab8a2b
[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now ( #29021 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 21:35:14 +00:00
rasmith
5e5a7eb16f
[CI/Build] Make test_attention_selector.py run tests on correct platform ( #29064 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-20 20:45:56 +00:00
rasmith
3d84ef9054
[CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py ( #29043 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 20:39:49 +00:00
Or Ozeri
647464719b
[KVConnector][Core] Support cross-layer KV blocks ( #27743 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-11-20 19:09:59 +01:00
TJian
82b05b15e6
[BugFix] [FEAT] Enable fastsafetensors for ROCm platform ( #28225 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-20 16:34:11 +00:00
Shinichi Hemmi
c9e093116c
[MODEL] Implement plamo3 ( #28834 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
2025-11-20 03:00:19 -08:00
Or Ozeri
c0c2dd1e0b
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks ( #28951 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 18:55:10 +08:00
Vensen
fb8851f254
[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu ( #28760 )
...
Signed-off-by: vensen <vensenmu@gmail.com>
Signed-off-by: Vensenmu <vensenmu@gmail.com>
2025-11-20 02:52:02 -08:00
rasmith
322cb02872
[CI/Build][AMD] Fix import errors in tests/kernels/attention ( #29032 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 17:48:09 +08:00
Wentao Ye
2c52c7fd9a
[Bug] Fix torch dynamo warning Dynamo detected a call to a functools.lru_cache ( #29038 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-20 16:52:23 +08:00
Bradley D
1e1c06789e
[ci][amd] fix EPLB execution test ( #28742 )
...
Signed-off-by: Bradley Davis <bradleyhd@meta.com>
2025-11-20 14:53:38 +07:00
Lukas Geiger
a9705a290a
[Model][QwenVL] Replace torch.repeat_interleave with faster np.repeat ( #28964 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-19 22:04:23 -08:00
Canlin Guo
fe25772aa9
[Bugfix] Handle broken frames in video loading ( #29001 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: 凌葭 <lvjiang.lj@alibaba-inc.com>
Co-authored-by: 凌葭 <lvjiang.lj@alibaba-inc.com>
2025-11-20 04:38:12 +00:00
Benjamin Chislett
fcbcba6c70
[Feat] Iteration-level profiling for Torch and CUDA profiler ( #28987 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-19 19:17:48 -08:00
liangel-02
1d642872a2
[torchao] fix safetensors for sharding ( #28169 )
...
Signed-off-by: Angel Li <liangel@meta.com>
2025-11-19 16:39:45 -08:00
Nick Hill
9ccef8e333
[Misc] Colorize logs ( #29017 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-19 19:26:04 -05:00
Alexander Matveev
3aaa94ac99
[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier ( #28687 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-19 15:47:13 -08:00
Micah Williamson
22e44ad589
[ROCm][CI] Fix Weight Loading With Multiple GPU Tests on ROCm ( #28984 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-11-19 21:31:33 +00:00
Shu Wang
613abb50d5
[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked ( #25990 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-19 13:29:06 -08:00
Wentao Ye
1607e664f0
[Bug] Fix Batch Invariant MLA test ( #28967 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-19 21:18:32 +00:00
Ryan Rock
68d7231991
[CI/Build] Fix test_prefix_prefill for AMD ( #28905 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2025-11-19 16:04:36 -05:00
Qiu
2fd893b4ce
[Feature] Prefill Context Parallel (PCP) basic support ( #28718 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
2025-11-19 15:52:44 -05:00
Copilot
61728cd1df
Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests ( #28966 )
...
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 13:32:19 -05:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 ( #28542 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
vnadathur
1ffe934c8a
[torch.compile] caching of config fields should be opt-out by default ( #26468 )
...
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 06:13:54 -08:00
Yanan Cao
2c8b9182b5
[CI] Reorganize compile tests so new tests are automatically included in CI ( #28625 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-19 06:13:50 -08:00
Didier Durand
09540cd918
[Doc]: fix typos in various files ( #29010 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-19 04:56:21 -08:00
gnovack
d69062c67a
add support for --fully-sharded-loras in fused_moe ( #28761 )
...
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-19 16:32:00 +08:00
Roman Solomatin
71d0ae1c54
[Misc] Update embedding/cross encoder tests to use mteb v2 ( #27329 )
...
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-11-18 22:28:40 -08:00
Matthew Bonanni
4c23690f43
[Attention] FlashAttention ViT support, make default backend ( #28763 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-18 20:06:21 -08:00
Strahinja Stamenkovic
814843e021
Enable bitsandbytes quantization on AMD GPUs that use warp size 32 ( #27307 )
...
Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>
2025-11-19 03:12:31 +00:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 ( #28826 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Jialin Ouyang
40b6b38f2c
[Core] Switch Flat logprob control from environment variable to SamplingParams ( #28914 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-11-19 02:10:02 +00:00
Kunshang Ji
2a2d5d2780
Replace torch.cuda.Event with torch.Event for better hardware compatibility ( #26985 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-18 11:34:36 -08:00
Chendi.Xue
c3e2978620
[NIXL] fix cpu PD after physical <> logical block_size PR ( #28904 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-11-18 14:03:23 -05:00
Kevin H. Luu
c64c0b78de
[chore] Move the rest of wikimedia url to S3 ( #28921 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 09:44:18 -08:00
Nicolò Lucchesi
f226a3f0c1
[CI][NIXL] Change default block_size for tests ( #28927 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-18 09:22:30 -08:00
Luciano Martins
c2612371ad
[Model] Add Gemma3 GGUF multimodal support ( #27772 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 08:56:29 -08:00
Alex
f6aa122698
[CI Sprint] Quantization CI Cleanup ( #24130 )
...
Signed-off-by: Alex Yun <alexyun04@gmail.com>
2025-11-18 09:21:48 -05:00
Isotr0py
896e41ae04
[CI/Build] Replace wikipedia url with local server ones ( #28908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 08:10:55 +00:00
Nick Hill
5bdd155277
[CI] Fix async scheduling + spec decoding test flake ( #28902 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-18 05:26:32 +00:00
Cyrus Leung
bf9e1e8767
[Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields ( #28872 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-17 20:30:29 -08:00
Benjamin Bartels
b6e04390d3
[Bugfix] Fix Kimi-K2 tool parser concatenated tool calls parsing ( #28831 )
...
Signed-off-by: Thomas Mao <yiyeguhu@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Thomas Mao <yiyeguhu@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-11-17 19:13:25 -08:00
Pranav
f77bce001a
[Model] Add Afmoe architecture implementation ( #28332 )
...
Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Signed-off-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
2025-11-17 15:11:20 -08:00
Wentao Ye
a289cc1dde
[Test] Batch Invariant: Rename and organize tests ( #27421 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-17 18:09:47 -05:00