vllm/attention at 1bd32bc8dd685b1bcddc0e7408a46a7c637dae8a - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-01 02:17:03 +08:00

History

TJian 916836bbfb

[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664 )

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

2025-03-12 09:31:19 -07:00

..

[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664 )

2025-03-12 09:31:19 -07:00

[CPU] Upgrade CPU backend to torch-2.6 (#13381 )

2025-03-12 10:41:13 +00:00

__init__.py

[Attention] MLA with chunked prefill (#12639 )

2025-02-21 15:30:12 -08:00

layer.py

[Bug] Fix Attention when ignored in by quant_method (#14313 )

2025-03-06 14:18:06 -08:00

selector.py

Correct capitalisation: VLLM -> vLLM (#14562 )

2025-03-10 16:36:21 +00:00