vllm/attention at dad961ef5ca3893b78224323ec943dce9f52f868 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-01 23:47:35 +08:00

History

William Lin f366f6339b

[spec decode] [4/N] Move update_flash_attn_metadata to attn backend (#7571 )

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

2024-08-16 11:41:56 -07:00

..

[spec decode] [4/N] Move update_flash_attn_metadata to attn backend (#7571 )

2024-08-16 11:41:56 -07:00

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208 )

2024-08-12 22:47:41 +00:00

__init__.py

[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )

2024-08-06 16:51:47 -04:00

layer.py

[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )

2024-08-06 16:51:47 -04:00

selector.py

[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102 )

2024-08-13 00:16:42 -07:00