vllm/kernels at bbe888d03300e4e8f542e6e29b21d5001946a5b1 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-12 16:37:07 +08:00

History

Bill Nell bbe888d033 wip

Signed-off-by: Bill Nell <bnell@redhat.com>

2025-05-28 23:40:27 +00:00

..

[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100 )

2025-05-29 07:21:46 +08:00

[Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229 )

2025-05-16 01:32:45 +00:00

[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels (#17146 )

2025-05-06 17:59:30 -07:00

wip

2025-05-28 23:40:27 +00:00

[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646 )

2025-05-27 04:40:28 +00:00

__init__.py

[CI/Build] Move test_utils.py to tests/utils.py (#4425 )

2024-05-13 23:50:09 +09:00

allclose_default.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

quant_utils.py

Add missing rocm_skinny_gemms kernel test to CI (#17060 )

2025-04-24 07:49:37 -07:00

test_cutlass_mla_decode.py

[NVIDIA] Support Cutlass MLA for Blackwell GPUs (#16032 )

2025-04-27 06:29:21 -07:00

test_fused_quant_activation.py

[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 )

2025-05-13 22:13:56 -07:00

test_triton_flash_attention.py

[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591 )

2025-04-27 00:35:08 +00:00

utils.py

[Misc] Replace os environ to monkeypatch in test suite (#14516 )

2025-03-16 20:35:57 -07:00