vllm/kernels at b1ded114b976b0fa4445565d235b23e64a4dc737 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 18:57:24 +08:00

History

Chih-Chieh Yang 2b6b1d7809

[Model] Mamba2 varlen refactor (#21467 )

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>

2025-09-26 11:31:14 +00:00

attention

[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 )

2025-09-25 17:37:50 +00:00

core

[mypy] Fix wrong type annotations related to tuple (#25660 )

2025-09-25 13:00:45 +00:00

mamba

[Model] Mamba2 varlen refactor (#21467 )

2025-09-26 11:31:14 +00:00

moe

[Model] Add LongCat-Flash (#23991 )

2025-09-24 21:53:40 -07:00

quantization

Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607 )

2025-09-25 08:05:21 +00:00

__init__.py

[CI/Build] Move test_utils.py to tests/utils.py (#4425 )

2024-05-13 23:50:09 +09:00

allclose_default.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

quant_utils.py

[Refactor] Remove Duplicate per_block_cast_to_fp8, Remove Dependencies of DeepGEMM (#21787 )

2025-08-01 01:13:27 +00:00

test_apply_repetition_penalties.py

[BUG] Fix #20484 . Support empty sequence in cuda penalty kernel (#20491 )

2025-07-05 19:38:02 -07:00

test_flex_attention.py

Updates to Flex + VLLm integration (#21416 )

2025-08-25 09:32:42 -04:00

test_fused_quant_activation.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

test_onednn.py

[mypy] Fix wrong type annotations related to tuple (#25660 )

2025-09-25 13:00:45 +00:00

test_shuffle_rows.py

[Bugfix] Fix CUDA arch flags for MoE permute (#21426 )

2025-07-24 03:23:59 -07:00

test_triton_flash_attention.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

utils.py

[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 )

2025-09-25 17:37:50 +00:00