xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-08 18:35:41 +08:00

Author	SHA1	Message	Date
Wentao Ye	f7dcce7a4a	[Feature] Add `VLLM_USE_DEEP_GEMM_E8M0` Env to Control E8M0 Scale (#21968 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 09:39:08 -07:00
Ning Xie	326976291b	[Misc] code clean duplicate set_current_vllm_config in _set_vllm_config (#22566 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-10 00:08:48 -07:00
Jee Jee Li	0edc0cd52b	[Bugfix] Fix CI moe kernel failure (#22556 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-09 00:03:29 -07:00
Yongye Zhu	e789cad6b8	[gpt-oss] triton kernel mxfp4 (#22421 ) Signed-off-by: <zyy1102000@gmail.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-08-08 08:24:07 -07:00
Wentao Ye	6e8d8c4afb	[Test] Add Unit Test for Batched DeepGEMM (#21559 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-02 10:45:46 +08:00
Wentao Ye	3700642013	[Refactor] Remove Duplicate `per_block_cast_to_fp8`, Remove Dependencies of DeepGEMM (#21787 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 01:13:27 +00:00
Matthew Bonanni	e360316ab9	Add DeepGEMM to Dockerfile in vllm-base image (#21533 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-31 18:01:55 -07:00
Caleb_Du	57c22e57f9	Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-07-27 07:08:00 -07:00
Yang Chen	6929f8b437	[Misc] fixed nvfp4_moe test failures due to invalid kwargs (#21246 ) Signed-off-by: Yang Chen <yangche@fb.com>	2025-07-23 01:41:43 -07:00
Ming Yang	e7b2042681	Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762 ) (#21334 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-07-21 21:49:01 -07:00
shixianc	7d94577138	Add torch golden impl for moe_align_block_size kernel test (#20653 ) Signed-off-by: Shixian Cui <shixian@amazon.com> Co-authored-by: Shixian Cui <shixian@amazon.com>	2025-07-19 02:32:36 -07:00
shixianc	5780121c95	[Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm (#20911 ) Signed-off-by: Shixian Cui <shixian@amazon.com> Co-authored-by: Shixian Cui <shixian@amazon.com>	2025-07-18 04:34:43 +00:00
ElizaWszola	9fb2d22032	[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-17 09:56:44 -04:00
Varun Sundar Rabindranath	11dfdf21bf	[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-17 08:10:37 +00:00
Peter Pan	1eb2b9c102	[CI] update typos config for CI pre-commit and fix some spells (#20919 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2025-07-15 21:12:40 -07:00
Wentao Ye	c1acd6d7d4	[Refactor] Change the way of import triton (#20774 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-12 19:39:55 -07:00
Wentao Ye	42d440c22b	[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-12 19:38:45 -07:00
Varun Sundar Rabindranath	53fa457391	[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-11 07:51:46 -07:00
Wentao Ye	e2de455c34	[Feature] Integrate SM100 DeepGEMM support (#20087 )	2025-07-10 20:18:05 -07:00
Varun Sundar Rabindranath	f0c98cae27	[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-10 14:40:38 -07:00
Varun Sundar Rabindranath	fdadb6f43a	[Bugfix] Fused MoE Modular Kernel chunking loop (#20392 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-10 20:31:10 +00:00
fxmarty-amd	332d4cb17b	[Feature][Quantization] MXFP4 support for MOE models (#17888 ) Signed-off-by: Felix Marty <felmarty@amd.com> Signed-off-by: Bowen Bao <bowenbao@amd.com> Signed-off-by: Felix Marty <Felix.Marty@amd.com> Co-authored-by: Bowen Bao <bowenbao@amd.com>	2025-07-09 13:19:02 -07:00
Ming Yang	afb7cff1b9	[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe (#20167 ) Signed-off-by: Ming Yang <yming@meta.com>	2025-07-08 01:07:22 +00:00
Michael Goin	c108781c85	[CI Bugfix] Fix pre-commit failures on main (#20502 )	2025-07-04 14:17:30 -07:00
Duncan Moss	3d184b95b8	[feat]: CUTLASS block scaled group gemm for SM100 (#19757 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Co-authored-by: Duncan Moss <dmoss@nvidia.com>	2025-07-04 12:58:04 -06:00
Jee Jee Li	1caca5a589	[Misc] Add SPDX-FileCopyrightText (#20428 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-04 07:40:42 +00:00
bnellnm	78fe77534b	[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-07-03 14:55:40 -07:00
bnellnm	c1909e7e8c	[Kernels] MoE refactor (#19636 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>	2025-07-02 06:08:27 -07:00
Wentao Ye	7058d7dd5d	[Refactor] Remove duplicate `find_free_port` (#20333 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-01 19:03:07 -07:00
Varun Sundar Rabindranath	08d81f1014	[Bugfix] Fix deepep tests (#20288 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-01 15:29:08 +08:00
Wentao Ye	551ef1631a	[Unit Test] Add unit test for deep gemm (#20090 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-30 10:26:42 -06:00
Wentao Ye	4d36693687	[Refactor] Create a function util and cache the results for `has_deepgemm`, `has_deepep`, `has_pplx` (#20187 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-28 22:06:38 +00:00
Wentao Ye	562308816c	[Refactor] Rename commnication utils (#20091 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-26 22:19:32 +00:00
Wentao Ye	c894c5dc1f	[Bug Fix] Fix address/port already in use error for deep_ep test (#20094 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-26 22:33:13 +08:00
bnellnm	015fab8c2f	[Kernels][Bugfix] Use torch op for all kernels in FusedMoE forward. Add additional testing for cudagraphs. (#19717 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-06-24 23:22:58 -07:00
Wentao Ye	a6c4b87fbc	Revert "[Feature] Integrate new deepgemm (#19820 )" (#20049 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 19:45:22 -07:00
Wentao Ye	c6e3bba8e6	[Feature] Integrate new deepgemm (#19820 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 12:51:56 -07:00
Tyler Michael Smith	68aaeb3749	[EP+DP] Optimize the little operations in the DeepGEMM + DeepEP low latency case (#19885 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-23 11:07:47 -07:00
Wentao Ye	ffb2cd6b54	[Perf] Optimize `moe_align_block_size` CUDA kernel (#19572 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-17 11:49:26 -07:00
bnellnm	29fa5cac1c	[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-06-11 12:53:10 -04:00
Varun Sundar Rabindranath	5cf2daea9a	[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-09 10:50:39 -04:00
ElizaWszola	84166fee97	[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-06 18:26:11 -07:00
Chiyue Wei	61059bee40	[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110 ) Signed-off-by: Chiyue Wei <chiyuew@nvidia.com> Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>	2025-06-05 09:48:26 -07:00
Varun Sundar Rabindranath	c3fd4d669a	[Kernel] Integrate batched/masked deepgemm kernel (#19111 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-04 21:59:18 +00:00
Varun Sundar Rabindranath	fa98d77773	[Kernel] DeepEP dispatch-combine kernel integration (#18434 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-03 12:30:02 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
vllmellm	0f5e0d567e	[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-05-31 03:39:31 -07:00
Tyler Michael Smith	6e588da0f4	[Build/CI] Fix CUDA 11.8 build (#17679 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-22 12:13:54 -07:00
bnellnm	c6c10ca920	[Bugfix] Reduce moe_sum test size to avoid OOM (#18484 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-05-21 06:46:39 -07:00
bnellnm	92247c522e	[Bug] Fix moe_sum signature (#18440 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-05-20 22:37:08 -07:00

1 2

59 Commits