vllm/vllm/attention/__init__.py
Lucas Wilkinson 288cc6c234
[Attention] MLA with chunked prefill (#12639)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Patrick Horn <patrick.horn@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-21 15:30:12 -08:00

16 lines
680 B
Python

# SPDX-License-Identifier: Apache-2.0
from vllm.attention.backends.abstract import (AttentionBackend,
AttentionMetadata,
AttentionMetadataBuilder,
AttentionState, AttentionType)
from vllm.attention.backends.utils import get_flash_attn_version
from vllm.attention.layer import Attention
from vllm.attention.selector import get_attn_backend
__all__ = [
"Attention", "AttentionBackend", "AttentionMetadata", "AttentionType",
"AttentionMetadataBuilder", "Attention", "AttentionState",
"get_attn_backend", "get_flash_attn_version"
]