5 Commits

Author SHA1 Message Date
Pleaplusone
d9d342d214
[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (#27457)
Signed-off-by: ganyi <ygan@amd.com>
2025-11-26 12:45:28 +08:00
courage17340
981cadb35c
[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty (#28181)
Signed-off-by: courage17340 <courage17340@163.com>
2025-11-06 17:52:13 +08:00
Lucas Wilkinson
ce75efeecb
[BugFix] FA2 MLA Accuracy Issue (#18807)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
2025-05-28 08:59:39 +00:00
DefTruth
e82ee40de3
[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693)
Signed-off-by: DefTruth <qiustudent_r@163.com>
2025-04-16 03:31:39 -07:00
DefTruth
e9528f6dc6
[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173)
Signed-off-by: DefTruth <qiustudent_r@163.com>
2025-04-11 06:50:50 -06:00