vllm/attention at 6fc4e6e07a55559c3744212b4d562e20d024e661 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-28 15:57:25 +08:00

History

Cody Yu 9606c7197d

Revert #7509 (#7887 )

2024-08-27 00:16:31 -07:00

..

Revert #7509 (#7887 )

2024-08-27 00:16:31 -07:00

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208 )

2024-08-12 22:47:41 +00:00

__init__.py

[Core] Add AttentionState abstraction (#7663 )

2024-08-20 18:50:45 +00:00

layer.py

[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )

2024-08-06 16:51:47 -04:00

selector.py

[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102 )

2024-08-13 00:16:42 -07:00