vllm/attention at 8e4b351a0c9e414b0c56c32cbdef51a21d1ea1be - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-01 02:27:10 +08:00

History

rasmith 8e4b351a0c

[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591 )

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

2025-04-27 00:35:08 +00:00

..

[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (#17186 )

2025-04-26 05:55:14 -07:00

[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591 )

2025-04-27 00:35:08 +00:00

__init__.py

[Attention] Flash Attention 3 - fp8 (#14570 )

2025-03-20 01:14:20 -04:00

layer.py

[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734 )

2025-04-25 00:45:02 -07:00

selector.py

Correct capitalisation: VLLM -> vLLM (#14562 )

2025-03-10 16:36:21 +00:00