xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-29 05:57:04 +08:00

Author	SHA1	Message	Date
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
amirkl94	03ee48111d	Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261 )	2025-11-16 13:39:44 -05:00
Zhewen Li	1ec978c209	[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 (#28709 ) Signed-off-by: Zhewen Li <zhewenli@meta.com>	2025-11-15 01:10:48 -08:00
Varun Sundar Rabindranath	6965ef436f	[Performance][DeepGEMM] Estimate expected_m (#28694 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-15 13:52:14 +08:00
Alexander Matveev	e5c78956c0	[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-14 14:13:46 -08:00
Andrey Khalyavin	fd4555089a	[BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728 ) Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>	2025-11-14 10:58:18 -08:00
Duncan Moss	3f8a874065	[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) (#27134 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-14 08:02:44 -08:00
Varun Sundar Rabindranath	fe1cd7704d	[Performance][B200] silu_mul_quant: pack scales in int32 (#28358 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-13 10:16:55 -08:00
Lucia Fang	7e082bc14e	Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 (#28574 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-11-12 21:40:45 -08:00
Alexander Matveev	69d0e90313	[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 23:37:24 +00:00
Varun Sundar Rabindranath	74a9a9faad	[Performance][B200] Fix deepgemm prologue (#27897 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-12 13:13:03 -08:00
PerryZhang01	a1e7fa362a	[EPLB][ROCm]: support EPBL for ROCm backend (#27731 ) Signed-off-by: Perry Zhang <perzhang@amd.com> Co-authored-by: Perry Zhang <perzhang@amd.com>	2025-11-12 18:16:35 +00:00
Michael Goin	e5f599d4d1	[Bugfix] Disable shared expert overlap if Marlin MoE is used (#28410 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-11 23:16:12 +00:00
bnellnm	a1448b4b69	[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code (#28064 )	2025-11-11 07:29:02 -07:00
Robert Shaw	e605e8e323	[Bugfix] Fix Stream Sync for Shared Expert Overlap (#28430 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-11 05:59:08 +00:00
Ilya Markov	d17ecc6b19	[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-10 18:33:11 -05:00
Lucas Wilkinson	6dec9f6109	[BugFix] Fix DeepGEMM over-allocating workspace (#28254 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-10 17:01:17 -05:00
Sage Moore	40d33264c6	[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Sage Moore <sagemoore@utexas.edu> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-10 20:39:19 +00:00
jiahanc	34553b9d27	[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 12:34:57 -05:00
Varun Sundar Rabindranath	b039bfda8f	[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-10 09:21:52 -08:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Xiake Sun	03fa4d3fb3	[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X (#28373 ) Signed-off-by: Xiake Sun <xiake.sun@amd.com> Signed-off-by: Xiake Sun <xisun@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-10 04:53:40 +00:00
Robert Shaw	26990d25dc	[Bugfix] Update device name for H200 detection (#28349 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-08 19:01:11 +00:00
Michael Goin	0852527647	[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM (#28124 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-07 18:20:55 -08:00
Kunshang Ji	1aaecda078	[XPU] Enable Expert parallel for MoE models (#28263 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 00:33:11 +00:00
Eric Yue	0370679ce9	[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 (#28200 ) Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>	2025-11-06 07:29:46 -08:00
xiangze-arm	c757a15f0f	[CPU]Improve cpu fused moe perf (#27244 ) Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>	2025-11-06 11:04:18 +00:00
Xiaozhu Meng	e31946f86e	[flashinfer] fix FI all2all with FI cutlass moe (#28166 ) Signed-off-by: Xiaozhu <mxz297@gmail.com>	2025-11-06 05:52:16 +00:00
Wentao Ye	d71af5f502	[Feature] Enable TP + EP `shared_experts` overlap with router, 3.7% E2E performance improvement (#28164 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:21:08 -08:00
Frost Mitchell	6e97eccf5d	[XPU] Enable custom routing functions in IPEX for Llama4 (#28004 ) Signed-off-by: frost-intel <frost.mitchell@intel.com>	2025-11-05 13:39:57 +00:00
amirkl94	6b7a81185d	Bugfix: Cutlass FP8 FusedMoE bad scaling factors (#27255 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-05 06:06:06 -05:00
tou	4ea62b77f5	[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (#27740 )	2025-11-05 09:25:09 +08:00
bnellnm	938772af03	[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123 )	2025-11-04 21:59:45 +08:00
tomeras91	e4ee658672	[Model] add optimal triton fused moe configs for NemotronH MoE (#27967 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-04 12:59:43 +00:00
Varun Sundar Rabindranath	4022a9d279	[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904 )	2025-11-04 15:56:21 +08:00
Tyler Michael Smith	3758757377	[Bugfix] Fix MoE Routing Simulation (#28002 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-11-03 22:26:49 +00:00
Jee Jee Li	bc4486d609	[Kernel] Enable FusedMoEModularKernel support bias (#27754 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-01 02:05:12 +00:00
Shu Wang	fc16f1c477	Flashinfer_CUTLASS_MOE fuses quantization for TP (#27223 ) Signed-off-by: Shu Wang. <shuw@nvidia.com>	2025-10-31 17:54:29 +00:00
Roger Meier	2918c1b49c	[Model] Use the same fused_moe configs for all H200 devices (#23642 ) Signed-off-by: Roger Meier <r.meier@siemens.com>	2025-10-30 17:36:56 +00:00
Wentao Ye	b5d90f7400	[Bug] Fix DBO IMA issue for DeepEPHT (#27666 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-29 16:28:27 -04:00
Wentao Ye	fcb1d570bb	[Bug] Fix DeepEP low latency `assert self.batched_router_logits.size(-1) == full_router_logits.size(-1)` Bug (#27682 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-29 14:50:39 -04:00
Wentao Ye	0484b64248	[Bug] Fix shape issue for eplb expert weights (#27589 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-28 20:44:05 +08:00
Eric Yue	bdb01a38fe	[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X (#27323 ) Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>	2025-10-27 22:58:06 -07:00
Varun Sundar Rabindranath	5d3be3ba4c	[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement (#27487 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-27 07:32:50 -07:00
Danielle Robinson	9932ed6a83	[Kernel] Adding split_K implementation for fused_moe_lora (#27291 ) Signed-off-by: Danielle Robinson <dmmaddix@amazon.com> Signed-off-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Danielle Robinson <dmmaddix@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-27 02:05:24 -07:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
Wentao Ye	52efc34ebf	[Log] Optimize Startup Log (#26740 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-24 19:27:04 -04:00
Alexander Matveev	9ef3d5b875	[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer (#27220 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-10-24 00:03:14 +08:00
tomeras91	61089465a6	[Model] Add MoE support for NemotronH (#25863 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-10-23 10:27:23 +00:00
Wentao Ye	1c160841ea	[Bug] Fix DeepSeek-V2.5-1210-FP8 issue (#27267 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-22 11:00:10 -04:00

1 2 3 4 5 ...

477 Commits