vllm/csrc at 51826d51fa6ef36963ddd79e99dc77c7660ffbf5 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-08 18:04:39 +08:00

History

[Kernel] Add more dtype support for GGUF dequantization (#15879 )

Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>

2025-04-02 01:58:48 -07:00

attention

[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 )

2025-01-23 18:04:03 +00:00

core

[Attention] MLA with chunked prefill (#12639 )

2025-02-21 15:30:12 -08:00

cpu

[Kernel][CPU] CPU MLA (#14744 )

2025-03-25 09:34:59 +00:00

cutlass_extensions

[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 )

2025-03-27 00:54:44 +00:00

mamba

[MISC] Replace c10::optional with std::optional (#11730 )

2025-01-05 10:20:34 +09:00

moe

[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664 )

2025-03-12 09:31:19 -07:00

prepare_inputs

[Misc][Easy] Annotate unused vars in the csrc files (#14798 )

2025-03-15 12:40:09 +08:00

quantization

[Kernel] Add more dtype support for GGUF dequantization (#15879 )

2025-04-02 01:58:48 -07:00

rocm

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160 )

2025-03-25 15:36:45 +08:00

sparse/cutlass

[BugFix/Build] Fix sparse kernels not getting built on hopper (#14572 )

2025-03-11 17:09:03 +00:00

activation_kernels.cu

[Kernel] Support MulAndSilu (#11624 )

2025-01-15 02:29:53 +00:00

cache_kernels.cu

[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347 )

2025-03-18 05:50:19 -07:00

cache.h

[Attention] MLA with chunked prefill (#12639 )

2025-02-21 15:30:12 -08:00

cuda_compat.h

[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927 )

2024-06-02 14:13:26 -07:00

cuda_utils_kernels.cu

[NVIDIA] Support nvfp4 quantization (#12784 )

2025-02-12 19:51:51 -08:00

cuda_utils.h

[Attention] MLA with chunked prefill (#12639 )

2025-02-21 15:30:12 -08:00

cuda_view.cu

[V1] Fully Transparent Implementation of CPU Offloading (#15354 )

2025-03-31 20:22:34 +08:00

cumem_allocator.cpp

[core] improve error handling when wake up from sleep mode (#12981 )

2025-02-10 09:38:57 +08:00

custom_all_reduce_test.cu

[Distributed] Add custom allreduce support for ROCM (#14125 )

2025-03-31 22:49:12 -07:00

custom_all_reduce.cu

[Distributed] Add custom allreduce support for ROCM (#14125 )

2025-03-31 22:49:12 -07:00

custom_all_reduce.cuh

[Distributed] Add custom allreduce support for ROCM (#14125 )

2025-03-31 22:49:12 -07:00

dispatch_utils.h

dynamic distpatch of fp8 kernels (#14245 )

2025-03-11 10:54:56 -04:00

layernorm_kernels.cu

[torch.compile] Fuse RMSNorm with quant (#9138 )

2024-11-08 21:20:08 +00:00

layernorm_quant_kernels.cu

dynamic distpatch of fp8 kernels (#14245 )

2025-03-11 10:54:56 -04:00

ops.h

[Kernel] Add more dtype support for GGUF dequantization (#15879 )

2025-04-02 01:58:48 -07:00

permute_cols.cu

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

pos_encoding_kernels.cu

[Kernel] Make rotary_embedding ops more flexible with input shape (#12777 )

2025-02-06 08:46:13 -08:00

torch_bindings.cpp

[Kernel] Add more dtype support for GGUF dequantization (#15879 )

2025-04-02 01:58:48 -07:00

type_convert.cuh

[torch.compile] Fuse RMSNorm with quant (#9138 )

2024-11-08 21:20:08 +00:00