vllm/attention at 24aebae1777288503657e4163e14d854ed4ab633 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-02 16:47:14 +08:00

History

Hongxia Yang 28566d73b3

[ROCm] remove unsupported archs from rocm triton flash-attention supported list (#17536 )

Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>

2025-05-01 07:54:25 -07:00

..

[BugFix] Fix mla cpu - missing 3 required positional arguments (#17494 )

2025-05-01 14:36:52 +08:00

[ROCm] remove unsupported archs from rocm triton flash-attention supported list (#17536 )

2025-05-01 07:54:25 -07:00

[BugFix] Fix vllm_flash_attn install issues (#17267 )

2025-04-27 17:27:56 -07:00

__init__.py

[Attention] Flash Attention 3 - fp8 (#14570 )

2025-03-20 01:14:20 -04:00

layer.py

[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734 )

2025-04-25 00:45:02 -07:00

selector.py

Correct capitalisation: VLLM -> vLLM (#14562 )

2025-03-10 16:36:21 +00:00