mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-13 14:35:00 +08:00
This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.