This website requires JavaScript.
Explore
Help
Sign In
xinyun
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced
2026-03-22 19:41:22 +08:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
vllm
/
csrc
History
Zhang Xiangze
13ea39bc09
[CPU]Parallelize over tokens in int4 moe (
#29600
)
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
2025-12-02 06:21:39 +00:00
..
attention
[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (
#27457
)
2025-11-26 12:45:28 +08:00
core
…
cpu
[CPU] Update torch 2.9.1 for CPU backend (
#29664
)
2025-11-28 13:37:54 +00:00
cutlass_extensions
…
mamba
/mamba_ssm
…
moe
[CPU]Parallelize over tokens in int4 moe (
#29600
)
2025-12-02 06:21:39 +00:00
quantization
SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm (
#29711
)
2025-12-01 17:24:18 -08:00
quickreduce
…
rocm
…
sparse
/cutlass
…
activation_kernels.cu
…
cache_kernels.cu
[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence (
#28029
)
2025-11-24 19:05:46 -07:00
cache.h
[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence (
#28029
)
2025-11-24 19:05:46 -07:00
cub_helpers.h
…
cuda_compat.h
…
cuda_utils_kernels.cu
…
cuda_utils.h
…
cuda_view.cu
Simplify
from_blob
usage in
get_cuda_view_from_cpu_tensor
(
#29027
)
2025-11-22 10:35:32 +00:00
cumem_allocator_compat.h
…
cumem_allocator.cpp
…
custom_all_reduce_test.cu
…
custom_all_reduce.cu
…
custom_all_reduce.cuh
…
custom_quickreduce.cu
…
dispatch_utils.h
…
fused_qknorm_rope_kernel.cu
…
launch_bounds_utils.h
…
layernorm_kernels.cu
…
layernorm_quant_kernels.cu
…
ops.h
[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (
#27457
)
2025-11-26 12:45:28 +08:00
permute_cols.cu
…
pos_encoding_kernels.cu
…
sampler.cu
…
torch_bindings.cpp
[Kernel][Quantization] add w4a8 support for marlin kernel (
#24722
)
2025-11-29 07:19:33 -08:00
type_convert.cuh
…