Wentao Ye
61249b177d
[Refactor] Remove useless syncwarp ( #30510 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-11 17:43:41 -05:00
Wentao Ye
0ee6416f67
[Perf] Optimize group_topk kernel, 1.9% Throughput improvement, 2.1% TPOT improvemnt ( #30159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-08 19:44:01 -05:00
Michael Goin
0852527647
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM ( #28124 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-07 18:20:55 -08:00
Ming Yang
527821d191
Use macro guard CUDA functions for back compatibility in grouped_topk_kernel.cu ( #25346 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-23 09:45:39 -07:00
Lumina
81b16a2bc9
[Kernel] Better inf handling for grouped topk cu ( #24886 )
...
Signed-off-by: lumina37 <starry.qvq@gmail.com>
2025-09-18 05:53:55 +00:00
Qiming Zhang
e919d6f549
[Kernel][Bugfix] Fix grouped topk cu ( #24146 )
...
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
2025-09-04 12:37:37 +08:00
Xin Yang
8a3cd90af5
[Kernel] Add fused grouped_topk kernel for MoE ( #23274 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-25 11:47:52 -07:00