amirkl94
6b7a81185d
Bugfix: Cutlass FP8 FusedMoE bad scaling factors ( #27255 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-05 06:06:06 -05:00
Eric Yue
b57789b62b
Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message ( #27635 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>
2025-11-05 19:03:51 +08:00
Chauncey
377061d481
[Misc] fix import error for DeepSeekR1ReasoningParser ( #28114 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-05 19:02:32 +08:00
Kuntai Du
86dca07d9b
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator ( #28011 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-11-05 10:36:31 +00:00
Qiu
16b37f3119
[bugfix] fix wrong dcp_local_seq_lens calc ( #27518 )
...
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
2025-11-05 17:58:13 +08:00
Chauncey
0976711f3b
[Refactor] to simplify and extract the shared logic between chat completion and responses ( #27961 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-05 15:46:39 +08:00
Chauncey
e261d37c9a
[Refactor] Lazy-loaded reasoning_parser ( #28092 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-05 15:37:02 +08:00
Alex Brooks
b7cbc25416
[Model, Core] Support Granite Speech & LoRA for STT ( #24455 )
2025-11-05 08:33:48 +01:00
Lucas Wilkinson
d43ad5a757
[BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) ( #28100 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-05 14:54:43 +08:00
Isotr0py
0ff05e3770
[Bugfix] Fix encoder-only model support for transformers backend ( #28021 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-04 22:24:41 -08:00
wangxiyuan
428bc7bf1c
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules ( #27955 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-04 20:51:16 -08:00
Zhewen Li
878fd5a16f
[CI/Build] Enable some fixed tests in AMD CI ( #28078 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-05 03:15:59 +00:00
Kunshang Ji
18b39828d9
[XPU] Add gpt-oss model support for Intel GPU ( #27786 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-05 02:17:23 +00:00
tou
4ea62b77f5
[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 ( #27740 )
2025-11-05 09:25:09 +08:00
Vadim Gimpelson
d4e547bb7e
Revert "[PERF] Decouple projections from GDN custom op" ( #28080 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-11-04 15:58:23 -08:00
Aleksandr Malyshev
2d977a7a9e
[ROCm] gemm_a16w16 upstreaming ( #26969 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-11-04 16:01:00 -05:00
Chenheli Hua
1fb4217a05
[Multimodal] Make MediaConnector extensible. ( #27759 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-11-04 18:28:01 +00:00
nadavkluger
611c86ea3c
Added disable rule to track files under benchmarks/lib ( #28048 )
...
Signed-off-by: Nadav Kluger <nadav.k@fmr.ai>
2025-11-04 18:18:43 +00:00
Pleaplusone
dc937175d4
[ROCm][Perf] New design on ROCm AITER MHA backend Implementation ( #25763 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-04 18:05:33 +00:00
Harry Mellor
2f1cc8cef1
Remove deprecated --rope-scaling and --rope-theta ( #28006 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-04 18:01:56 +00:00
Nick Hill
938a81692e
[AsyncScheduling] Don't schedule past request max_tokens ( #27922 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-04 17:06:28 +00:00
Nick Hill
c9f66da8fd
[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-04 08:33:55 -08:00
yt0428
05cae69f0f
[model] Add support for openPangu_Ultra_MoE ( #27521 )
...
Signed-off-by: yuantao <2422264527@qq.com>
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-04 08:17:20 -08:00
Vadim Gimpelson
5fd8f02ea9
[PERF] Decouple projections from GDN custom op ( #27512 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-11-04 08:11:41 -08:00
lyrisz
97e3dda84b
[Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM ( #27284 )
...
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com>
Co-authored-by: Faqin Zhong <zhofaqin@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-04 07:49:25 -08:00
Nick Hill
5a0a6dfd55
[BugFix] Fix incorrect preallocated sampled_token_ids tensor size ( #28025 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-04 07:38:16 -08:00
bnellnm
938772af03
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. ( #27123 )
2025-11-04 21:59:45 +08:00
tomeras91
e4ee658672
[Model] add optimal triton fused moe configs for NemotronH MoE ( #27967 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-11-04 12:59:43 +00:00
tomeras91
77f8001f53
[Model][Bugfix] fix pipeline parallelism support for NemotronH ( #27968 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-11-04 12:28:36 +00:00
Zhuohan Li
300a265978
[Core] Enable StatLogger in LLMEngine ( #28020 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-11-04 04:13:35 -08:00
Jerry Zhang
03c4c4aa9d
Support using Int4PreshuffledTensor after loading ( #26066 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-11-04 06:00:57 -05:00
yugong333
2ec401bc39
Load tuned fused_moe_lora shrink and expand kernel configs separately ( #27435 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-04 18:27:35 +08:00
Varun Sundar Rabindranath
4022a9d279
[BugFix][Performance] Restore flashinfer autotuning for all scenarios ( #27904 )
2025-11-04 15:56:21 +08:00
Zhewen Li
53f6e81dfd
[CI/Build] Fix OpenAI API correctness on AMD CI ( #28022 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-04 07:20:50 +00:00
CSWYF3634076
43a6acfb7d
[Model] fix ernie45 reasoning_parser ( #27973 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2025-11-04 07:16:46 +00:00
Mark McLoughlin
58279c60b5
[KV Connector] Make KVCacheConfig an explicit constructor argument ( #27887 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-03 23:00:49 -08:00
Zhewen Li
2f84ae1f27
[CI/Build] Update LM Eval Version in AMD CI ( #27944 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-04 06:36:40 +00:00
xiangze-arm
f32cbc9a0c
[CPU]Improve dynamic 4bit moe performance ( #27240 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
2025-11-04 06:33:23 +00:00
Wentao Ye
7e4be74104
[Bug] Batch invariant: Fix flash attn MLA RuntimeError: scheduler_metadata must have shape (metadata_size) ( #27884 )
2025-11-04 14:05:55 +08:00
Mark McLoughlin
380ba6816d
[Metrics] Enable sleep state metric outside of dev mode ( #27867 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-03 20:35:36 -08:00
liuzhenwei
14a125a06d
[NIXL][XPU] Pin NIXL version to 0.7.0 ( #27849 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2025-11-04 03:28:35 +00:00
Chauncey
c02fccdbd2
[Refactor] Lazy import tool_parser ( #27974 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-04 10:10:10 +08:00
li2haipeng
6ddae74054
[LoRA] Lora shrink swizzle ( #27694 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
Signed-off-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-04 09:30:20 +08:00
vllmellm
b13a447546
[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm ( #27748 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-03 17:12:19 -08:00
QiliangCui
7956b0c0bc
Remove the tpu docker image nightly build. ( #27997 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-11-04 00:35:54 +00:00
Tyler Michael Smith
3758757377
[Bugfix] Fix MoE Routing Simulation ( #28002 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-11-03 22:26:49 +00:00
Hank_
ccd3e55e51
[Bugfix][plugin] fla crash on plugin ( #27322 )
2025-11-04 05:27:03 +08:00
Matthew Bonanni
01baefe674
Add TP parameter to attention tests ( #27683 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-03 13:04:40 -08:00
Ning Xie
786030721e
[Docs] add runai_streamer_sharded to LoadConfig ( #27937 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-11-03 20:35:16 +00:00
Matthew Bonanni
145c00a4d3
[Bugfix] change FlashMLA reorder_batch_threshold ( #27777 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-03 15:17:10 -05:00