Varun Sundar Rabindranath
e8db44f883
[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP ( #24588 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-09-22 21:01:09 -07:00
bnellnm
4bdf400218
[Bugfix] Fix chunked a2_scales in modular kernels ( #25264 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-19 19:42:01 +00:00
bnellnm
dc2979c585
[Kernels] Overlap shared experts with combine instead of dispatch ( #24254 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-18 12:10:21 +08:00
bnellnm
5963b98b46
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses ( #22537 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-17 17:43:31 -06:00
Matthew Bonanni
d119fc8614
[CI][Bugfix] Fix failing Blackwell test ( #24993 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 15:55:02 -07:00
Sage Moore
567939953b
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM ( #23693 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 12:21:48 -04:00
Didier Durand
f4962a6d55
[Doc]: fix typos in Python comments ( #24417 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-08 00:22:16 -07:00
bnellnm
e9b92dcd89
[Kernels] Overlap shared experts with send/recv ( #23273 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-03 12:35:18 -04:00
bnellnm
8ad7285ea2
[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. ( #22035 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-15 14:46:00 -04:00
Shu Wang
c7d8724e78
[Core] FlashInfer CUTLASS fused MoE backend (NVFP4) ( #20037 )
...
Signed-off-by: shuw <shuw@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-17 21:32:45 -07:00
Varun Sundar Rabindranath
11dfdf21bf
[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels ( #20903 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-17 08:10:37 +00:00
Varun Sundar Rabindranath
c0569dbc82
[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts ( #20725 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-14 19:47:16 +00:00
Varun Sundar Rabindranath
f0c98cae27
[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce ( #20648 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-10 14:40:38 -07:00
Varun Sundar Rabindranath
fdadb6f43a
[Bugfix] Fused MoE Modular Kernel chunking loop ( #20392 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-10 20:31:10 +00:00
Varun Sundar Rabindranath
805d62ca88
[Misc] DP : Add ExpertTokensMetadata ( #20332 )
...
Signed-off-by: Varun <vsundarr@redhat.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
2025-07-10 00:33:14 +00:00
bnellnm
78fe77534b
[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. ( #18864 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-07-03 14:55:40 -07:00
bnellnm
c1909e7e8c
[Kernels] MoE refactor ( #19636 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
2025-07-02 06:08:27 -07:00
Ye (Charlotte) Qi
33d51f599e
[BugFix] Add an env to disable moe chunking to work around compile incompatibility ( #19642 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-06-22 15:17:49 -07:00
bnellnm
5e5baa91aa
[Kernels] Use empty for modular MoE workspaces ( #19667 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-06-16 14:58:01 +00:00
Varun Sundar Rabindranath
e3b12667d4
[BugFix] : Fix Batched DeepGemm Experts ( #19515 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-06-12 20:43:02 -06:00
bnellnm
29fa5cac1c
[Kernels] Add activation chunking logic to FusedMoEModularKernel ( #19168 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-06-11 12:53:10 -04:00
ElizaWszola
84166fee97
[Kernel] Integrate CUTLASS MoE kernel with PPLX ( #18762 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-06-06 18:26:11 -07:00
Varun Sundar Rabindranath
fa98d77773
[Kernel] DeepEP dispatch-combine kernel integration ( #18434 )
...
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-06-03 12:30:02 -07:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
bnellnm
f9c069c85e
Modularize fused experts and integrate PPLX kernels ( #15956 )
2025-05-14 13:11:54 -07:00