Wentao Ye
1607e664f0
[Bug] Fix Batch Invariant MLA test ( #28967 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-19 21:18:32 +00:00
Qiu
2fd893b4ce
[Feature] Prefill Context Parallel (PCP) basic support ( #28718 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
2025-11-19 15:52:44 -05:00
Izzy Putterman
02f5903b84
Eagle: MM Cuda Graphs with MRope ( #28896 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-19 15:01:05 -05:00
Aleksandr Malyshev
ac10fd3c69
Upstreaming aiter triton attention backend as a new backend ( #28701 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-11-19 19:59:30 +00:00
杰兮
9d2d561257
[Bugfix] Fix precision corruption when shared_experts_stream=None ( #28942 )
...
Signed-off-by: zhyajie <yajizhan@amd.com>
Co-authored-by: zhyajie <yajizhan@amd.com>
2025-11-19 19:30:57 +00:00
Robert Shaw
fe69f331f8
[Kernels] Improve H200 Fused MoE Config ( #28992 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-11-19 19:23:54 +00:00
Jialin Ouyang
3319a493fc
[Core] Reuse created spec tokens lists to mitigate GC cost ( #28917 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-19 19:20:22 +00:00
Yuxuan Zhang
0c80efd94f
GLM-V video segmentation solution adjustment ( #28941 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
2025-11-19 17:32:55 +00:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 ( #28542 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
Shanshan Shen
d44e9df7d4
[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device ( #26487 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-11-19 16:24:55 +00:00
Lucas Wilkinson
48fc8b1e59
[BugFix] Fix async-scheduling + FlashAttn MLA ( #28990 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-19 10:04:07 -05:00
vnadathur
1ffe934c8a
[torch.compile] caching of config fields should be opt-out by default ( #26468 )
...
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 06:13:54 -08:00
Yanan Cao
2c8b9182b5
[CI] Reorganize compile tests so new tests are automatically included in CI ( #28625 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-19 06:13:50 -08:00
Harry Mellor
4f5299f717
Relax Transformers modeling backend MoE experts check ( #28952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 21:50:30 +08:00
Didier Durand
09540cd918
[Doc]: fix typos in various files ( #29010 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-19 04:56:21 -08:00
Chen Bruce
da2f6800e0
[Feat][Perf] Enable deepep-low-latency with round-robin expert placement. ( #28449 )
...
Signed-off-by: bruceszchen <bruceszchen@tencent.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 13:46:24 +01:00
Tova Movshovitz
ba558c029a
[config] Expose get_total_num_hidden_layers() in ModelConfig ( #28961 )
...
Signed-off-by: tovam <tovam@pliops.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-19 11:37:11 +00:00
gnovack
d69062c67a
add support for --fully-sharded-loras in fused_moe ( #28761 )
...
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-19 16:32:00 +08:00
Didier Durand
7ed27f3cb5
[Doc]: fix typos in various files ( #28945 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-18 22:52:30 -08:00
Lukas Geiger
3d4e7d34be
[Model][QwenVL] Simplify cos/sin rotary embedding indexing ( #28962 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-19 05:43:01 +00:00
Gleb Kurchanov
73ff872db0
[Bugfix] Fix typo in Qwen3 Next model executor ( #28960 )
...
Signed-off-by: Gleb Kurchanov <nepherpitou@gmail.com>
2025-11-19 05:21:02 +00:00
Xin Yang
468a8d72ba
[Bugfix] Fix FusedMoEModularKernel for triton backend ( #28913 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com>
2025-11-19 13:05:22 +08:00
Matthew Bonanni
4c23690f43
[Attention] FlashAttention ViT support, make default backend ( #28763 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-18 20:06:21 -08:00
Strahinja Stamenkovic
814843e021
Enable bitsandbytes quantization on AMD GPUs that use warp size 32 ( #27307 )
...
Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>
2025-11-19 03:12:31 +00:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 ( #28826 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Jialin Ouyang
40b6b38f2c
[Core] Switch Flat logprob control from environment variable to SamplingParams ( #28914 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-11-19 02:10:02 +00:00
Jerry Zhang
da94c7c0eb
Move online quantization to model.load_weights ( #26327 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-11-18 16:52:41 -08:00
tomeras91
1395461f5f
[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op ( #28587 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-11-18 16:49:36 -08:00
Varun Sundar Rabindranath
9912b8ccb8
[Build] Add OpenAI triton_kernels ( #28788 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-18 16:45:20 -08:00
Michael Goin
67745d189f
Supress verbose logs from model_hosting_container_standards ( #28949 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-18 12:29:06 -08:00
Kunshang Ji
2a2d5d2780
Replace torch.cuda.Event with torch.Event for better hardware compatibility ( #26985 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-18 11:34:36 -08:00
Chendi.Xue
c3e2978620
[NIXL] fix cpu PD after physical <> logical block_size PR ( #28904 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-11-18 14:03:23 -05:00
Isotr0py
e4bb2684bc
[Models] Replace all nn.Conv2d with vLLM's Conv2dLayer ( #28842 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 18:56:04 +00:00
vllmellm
0af3d4f0df
[FEAT] [AITER] [ROCm] integrate aiter sampling ops ( #26084 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-18 17:28:34 +00:00
Nick Hill
da8dadf68b
[Minor] Rename ec_producer field to is_ec_producer ( #28884 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-18 17:26:07 +00:00
Luciano Martins
c2612371ad
[Model] Add Gemma3 GGUF multimodal support ( #27772 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 08:56:29 -08:00
Nicolò Lucchesi
184b12fdc6
[Bugfix][NIXL] Fix block_size_ratio when logical !=physical blocks ( #28925 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-18 22:07:50 +08:00
Canlin Guo
b9489f51e1
[Model][Perf] Use cos and sin cache in QwenVL ( #28798 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2025-11-18 11:51:54 +00:00
Song Zhixin
285eaa4285
[Bugfix] Safeguard against missing backend in AttentionBackendEnum ( #28846 )
...
Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: Song Zhixin <szxfml@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 10:53:44 +00:00
Nick Hill
439368496d
[BugFix] Fix PP/async scheduling with pooling models ( #28899 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-18 00:20:45 -08:00
Ning Xie
0168f69e50
[Misc] Remove unnecessary parentheses from log statements ( #28897 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-11-17 20:33:46 -08:00
Didier Durand
083cf326dc
[Doc]: fix typos in various files ( #28863 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-17 20:32:14 -08:00
Cyrus Leung
bf9e1e8767
[Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields ( #28872 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-17 20:30:29 -08:00
Wentao Ye
3ddcf46011
[Refactor] Remove Unused Func in Batch Invariant ( #28881 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-17 20:29:29 -08:00
xuebwang-amd
d0a73620cc
[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss ( #28638 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 11:16:45 +08:00
Benjamin Bartels
b6e04390d3
[Bugfix] Fix Kimi-K2 tool parser concatenated tool calls parsing ( #28831 )
...
Signed-off-by: Thomas Mao <yiyeguhu@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Thomas Mao <yiyeguhu@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-11-17 19:13:25 -08:00
Zhuohan Li
552cac95b5
[Misc] Fix wrong comment in scheduler ( #28880 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-11-17 15:32:22 -08:00
Bangsheng Tang
61485844fc
[BugFix] Corner case that could cause out-of-sync with external launcher mode and dp >1 ( #28774 )
2025-11-17 15:22:11 -08:00
Pranav
f77bce001a
[Model] Add Afmoe architecture implementation ( #28332 )
...
Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Signed-off-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
2025-11-17 15:11:20 -08:00
Shreyas Kulkarni
95ae50b7d1
[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle ( #28435 )
...
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>
2025-11-17 15:01:34 -08:00