vllm/csrc at bc34937d68e9715d8416457539fb528301cf6269 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-12 00:27:23 +08:00

History

Varun Sundar Rabindranath 6c916ac8a8

[BugFix] [Kernel] Add Cutlass2x fallback kernels (#5744 )

Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

2024-06-23 21:07:11 +00:00

attention

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

cpu

[Kernel][CPU] Add Quick gelu to CPU (#5717 )

2024-06-21 06:39:40 +00:00

moe

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

punica

[Kernel] Add punica dimension for Qwen2 LoRA (#5441 )

2024-06-20 17:55:41 -07:00

quantization

[BugFix] [Kernel] Add Cutlass2x fallback kernels (#5744 )

2024-06-23 21:07:11 +00:00

activation_kernels.cu

[Model] Port over CLIPVisionModel for VLMs (#5591 )

2024-06-20 11:52:09 +00:00

cache_kernels.cu

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

cache.h

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

cuda_compat.h

[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927 )

2024-06-02 14:13:26 -07:00

cuda_utils_kernels.cu

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

cuda_utils.h

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

custom_all_reduce_test.cu

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00

custom_all_reduce.cu

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

custom_all_reduce.cuh

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00

dispatch_utils.h

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

layernorm_kernels.cu

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

moe_align_block_size_kernels.cu

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

ops.h

[Bugfix] Fix the CUDA version check for FP8 support in the CUTLASS kernels (#5715 )

2024-06-20 18:36:10 +00:00

pos_encoding_kernels.cu

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

reduction_utils.cuh

[Kernel] Dynamic Per-Token Activation Quantization (#5037 )

2024-06-07 09:36:26 -07:00

registration.h

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

torch_bindings.cpp

[Bugfix] Fix the CUDA version check for FP8 support in the CUTLASS kernels (#5715 )

2024-06-20 18:36:10 +00:00