vllm/kernels at 6b46c4b653d1d730a9b75d32b59b9d60f879b9d7 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 22:17:28 +08:00

History

Woosuk Kwon 752c6ade2e

[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217 )

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

2025-07-19 13:53:17 -07:00

..

[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217 )

2025-07-19 13:53:17 -07:00

[CI] change spell checker from codespell to typos (#18711 )

2025-06-11 19:57:10 -07:00

[Kernel] Triton implementation of causal-conv1d for Mamba-based models (#18218 )

2025-07-09 12:53:55 -07:00

Add torch golden impl for moe_align_block_size kernel test (#20653 )

2025-07-19 02:32:36 -07:00

[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841 )

2025-07-12 19:38:45 -07:00

__init__.py

[CI/Build] Move test_utils.py to tests/utils.py (#4425 )

2024-05-13 23:50:09 +09:00

allclose_default.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

quant_utils.py

[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864 )

2025-07-03 14:55:40 -07:00

test_apply_repetition_penalties.py

[BUG] Fix #20484 . Support empty sequence in cuda penalty kernel (#20491 )

2025-07-05 19:38:02 -07:00

test_cutlass_mla_decode.py

[NVIDIA] Add Cutlass MLA backend (#17625 )

2025-06-03 21:40:26 -07:00

test_flex_attention.py

[Misc] Add SPDX-FileCopyrightText (#20428 )

2025-07-04 07:40:42 +00:00

test_fused_quant_activation.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

test_triton_flash_attention.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

utils.py

[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449 )

2025-07-11 07:51:46 -07:00