vllm/csrc at c1dc547129f5faaa2ca5ba557145b8ec8838693c - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 05:57:12 +08:00

History

Matt Wong 59a6abf3c9

[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782 )

2024-04-08 14:31:02 -07:00

..

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )

2024-04-03 14:15:55 -07:00

[Bugfix] Add kv_scale input parameter to CPU backend (#3840 )

2024-04-04 04:33:08 +00:00

Add fused top-K softmax kernel for MoE (#2769 )

2024-02-05 17:38:02 -08:00

[Kernel] support non-zero cuda devices in punica kernels (#3636 )

2024-03-27 00:37:42 +00:00

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )

2024-04-03 14:15:55 -07:00

activation_kernels.cu

Add kernel for GeGLU with approximate GELU (#3337 )

2024-03-12 22:06:17 -07:00

cache_kernels.cu

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )

2024-04-03 14:15:55 -07:00

cache.h

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )

2024-04-03 14:15:55 -07:00

cuda_compat.h

[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA (#3262 )

2024-03-10 15:27:45 -07:00

cuda_utils_kernels.cu

[ROCm] add support to ROCm 6.0 and MI300 (#2274 )

2024-01-26 12:41:10 -08:00

cuda_utils.h

[ROCm] add support to ROCm 6.0 and MI300 (#2274 )

2024-01-26 12:41:10 -08:00

custom_all_reduce_test.cu

[BugFix] Some fixes for custom allreduce kernels (#2760 )

2024-03-21 23:02:58 -07:00

custom_all_reduce.cu

[BugFix] Some fixes for custom allreduce kernels (#2760 )

2024-03-21 23:02:58 -07:00

custom_all_reduce.cuh

[BugFix] Some fixes for custom allreduce kernels (#2760 )

2024-03-21 23:02:58 -07:00

dispatch_utils.h

DeepseekMoE support with Fused MoE kernel (#2453 )

2024-01-29 21:19:48 -08:00

layernorm_kernels.cu

[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782 )

2024-04-08 14:31:02 -07:00

moe_align_block_size_kernels.cu

[Bugfix] Make moe_align_block_size AMD-compatible (#3470 )

2024-03-18 11:26:24 -07:00

ops.h

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )

2024-04-03 14:15:55 -07:00

pos_encoding_kernels.cu

Add batched RoPE kernel (#3095 )

2024-03-13 13:45:26 -07:00

pybind.cpp

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )

2024-04-03 14:15:55 -07:00

reduction_utils.cuh

[Kernel] Layernorm performance optimization (#3662 )

2024-03-30 14:26:38 -07:00