Boyuan Feng
|
a903d59ffa
|
cleanup at::Tag::needs_fixed_stride_order (#28974)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-20 02:51:36 -08:00 |
|
rasmith
|
322cb02872
|
[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-20 17:48:09 +08:00 |
|
Wentao Ye
|
2c52c7fd9a
|
[Bug] Fix torch dynamo warning Dynamo detected a call to a functools.lru_cache (#29038)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-20 16:52:23 +08:00 |
|
Bradley D
|
1e1c06789e
|
[ci][amd] fix EPLB execution test (#28742)
Signed-off-by: Bradley Davis <bradleyhd@meta.com>
|
2025-11-20 14:53:38 +07:00 |
|
Pleaplusone
|
7218f83992
|
[ROCm][BugFix] Fix shared expert loading error when disable VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS (#28633)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-20 14:50:23 +07:00 |
|
Cyrus Leung
|
20e4497be2
|
[V0 Deprecation] Remove num_lookahead_slots (#29000)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-20 06:39:10 +00:00 |
|
Quentin Gallouédec
|
1c7bcc55b8
|
[Frontend] Allow parsed tool arguments (#28820)
Signed-off-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-19 22:20:12 -08:00 |
|
Lukas Geiger
|
a9705a290a
|
[Model][QwenVL] Replace torch.repeat_interleave with faster np.repeat (#28964)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-19 22:04:23 -08:00 |
|
Isotr0py
|
64192d5624
|
[Bugfix] Revert custom attention mask for gemma3-mm (#28995)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-20 13:23:22 +08:00 |
|
Canlin Guo
|
fe25772aa9
|
[Bugfix] Handle broken frames in video loading (#29001)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: 凌葭 <lvjiang.lj@alibaba-inc.com>
Co-authored-by: 凌葭 <lvjiang.lj@alibaba-inc.com>
|
2025-11-20 04:38:12 +00:00 |
|
prashanth058
|
0cca9b4d13
|
[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM (#28972)
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
|
2025-11-20 03:50:37 +00:00 |
|
Shengliang Xu
|
a8c536829c
|
Consolidate Nvidia ModelOpt quant config handling for all quantization methods (#28076)
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
|
2025-11-19 22:39:36 -05:00 |
|
Benjamin Chislett
|
fcbcba6c70
|
[Feat] Iteration-level profiling for Torch and CUDA profiler (#28987)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-19 19:17:48 -08:00 |
|
Fadi Arafeh
|
3168285fca
|
[cpu][ci] Add initial set of tests for Arm CPUs (#28657)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-11-20 02:37:09 +00:00 |
|
Qiang Zhang
|
3fb0d90999
|
[AMD] Use Decoupled Kernel Block Size to Support AITER MLA block_size=1 (#27715)
Signed-off-by: chiangzhang <chiangzhang@tencent.com>
|
2025-11-20 02:11:52 +00:00 |
|
Kuntai Du
|
05c2dee7e9
|
[DeepSeek + LMCache Multiprocess] handle MLA for deepseek model + LMCache Multiprocess connector (#29039)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-11-20 01:40:49 +00:00 |
|
liangel-02
|
1d642872a2
|
[torchao] fix safetensors for sharding (#28169)
Signed-off-by: Angel Li <liangel@meta.com>
|
2025-11-19 16:39:45 -08:00 |
|
Nick Hill
|
9ccef8e333
|
[Misc] Colorize logs (#29017)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-19 19:26:04 -05:00 |
|
Jialin Ouyang
|
537cc635c7
|
[GC Debugger] Simply and improve GC Debugger Utils (#29029)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-20 00:10:22 +00:00 |
|
Wentao Ye
|
5031cd5d55
|
[Refactor] Optimize select_experts (#28069)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-19 18:53:15 -05:00 |
|
Alexander Matveev
|
3aaa94ac99
|
[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier (#28687)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-11-19 15:47:13 -08:00 |
|
JartX
|
8e38e99829
|
[Feature] EPLB on Qwen3VLMoe and CompressedTensorsWNA16MoEMethod (#28849)
|
2025-11-19 18:30:08 -05:00 |
|
Wentao Ye
|
0075bfffd4
|
[CI] Fix precommit rope_theta issue (#29040)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-19 14:22:43 -08:00 |
|
Max Hu
|
cb0a7b4bea
|
[Bugfix] Move flashinfer kernel check into ``__init__` function of `FusedMoE`` (#29018)
Signed-off-by: Max Hu <hyoung2991@gmail.com>
|
2025-11-19 21:54:15 +00:00 |
|
Lucas Wilkinson
|
8f4f77a727
|
[BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 (#29036)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-19 13:43:54 -08:00 |
|
Micah Williamson
|
22e44ad589
|
[ROCm][CI] Fix Weight Loading With Multiple GPU Tests on ROCm (#28984)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-11-19 21:31:33 +00:00 |
|
Yongye Zhu
|
88f5b19f0b
|
[DeepSeek] Fix DeepSeek V3.2 Rope Embedding (#28968)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-11-19 16:30:04 -05:00 |
|
Shu Wang
|
613abb50d5
|
[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-19 13:29:06 -08:00 |
|
Julien Denize
|
cdeec2e606
|
[BugFix] Ray with multiple nodes (#28873)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-11-19 21:20:58 +00:00 |
|
Wentao Ye
|
1607e664f0
|
[Bug] Fix Batch Invariant MLA test (#28967)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-19 21:18:32 +00:00 |
|
Ryan Rock
|
68d7231991
|
[CI/Build] Fix test_prefix_prefill for AMD (#28905)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2025-11-19 16:04:36 -05:00 |
|
Qiu
|
2fd893b4ce
|
[Feature] Prefill Context Parallel (PCP) basic support (#28718)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
|
2025-11-19 15:52:44 -05:00 |
|
Izzy Putterman
|
02f5903b84
|
Eagle: MM Cuda Graphs with MRope (#28896)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-19 15:01:05 -05:00 |
|
Aleksandr Malyshev
|
ac10fd3c69
|
Upstreaming aiter triton attention backend as a new backend (#28701)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-11-19 19:59:30 +00:00 |
|
杰兮
|
9d2d561257
|
[Bugfix] Fix precision corruption when shared_experts_stream=None (#28942)
Signed-off-by: zhyajie <yajizhan@amd.com>
Co-authored-by: zhyajie <yajizhan@amd.com>
|
2025-11-19 19:30:57 +00:00 |
|
Robert Shaw
|
fe69f331f8
|
[Kernels] Improve H200 Fused MoE Config (#28992)
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-11-19 19:23:54 +00:00 |
|
Jialin Ouyang
|
3319a493fc
|
[Core] Reuse created spec tokens lists to mitigate GC cost (#28917)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-19 19:20:22 +00:00 |
|
Copilot
|
61728cd1df
|
Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests (#28966)
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-19 13:32:19 -05:00 |
|
Yuxuan Zhang
|
0c80efd94f
|
GLM-V video segmentation solution adjustment (#28941)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-11-19 17:32:55 +00:00 |
|
Harry Mellor
|
a8b70304d6
|
Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-19 09:06:36 -08:00 |
|
Shanshan Shen
|
d44e9df7d4
|
[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device (#26487)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-11-19 16:24:55 +00:00 |
|
Lucas Wilkinson
|
48fc8b1e59
|
[BugFix] Fix async-scheduling + FlashAttn MLA (#28990)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-19 10:04:07 -05:00 |
|
vnadathur
|
1ffe934c8a
|
[torch.compile] caching of config fields should be opt-out by default (#26468)
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-19 06:13:54 -08:00 |
|
Yanan Cao
|
2c8b9182b5
|
[CI] Reorganize compile tests so new tests are automatically included in CI (#28625)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-19 06:13:50 -08:00 |
|
Harry Mellor
|
4f5299f717
|
Relax Transformers modeling backend MoE experts check (#28952)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-19 21:50:30 +08:00 |
|
Didier Durand
|
09540cd918
|
[Doc]: fix typos in various files (#29010)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-19 04:56:21 -08:00 |
|
Chen Bruce
|
da2f6800e0
|
[Feat][Perf] Enable deepep-low-latency with round-robin expert placement. (#28449)
Signed-off-by: bruceszchen <bruceszchen@tencent.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-19 13:46:24 +01:00 |
|
Tova Movshovitz
|
ba558c029a
|
[config] Expose get_total_num_hidden_layers() in ModelConfig (#28961)
Signed-off-by: tovam <tovam@pliops.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-19 11:37:11 +00:00 |
|
Harry Mellor
|
97cfa99d59
|
[Docs] Take env var definition out of folded admonition (#29005)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-19 03:32:04 -08:00 |
|
j20120307
|
bbc6c2f1e5
|
[CI/Build] Fix broken build on Apple M1 (#28999)
Signed-off-by: Kan Zhu <j20120307@gmail.com>
|
2025-11-19 11:07:22 +00:00 |
|