vllm/attention at 8eb0a1d90621927538697f75b4e17c6f79153b4d - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-11 05:47:03 +08:00

History

Adrian Abeyta c42ff4f4fd

[BugFix][torch.compile] KV scale calculation issues with FP8 quantization (#25513 )

Signed-off-by: adabeyta <aabeyta@redhat.com>

2025-09-29 15:52:04 -04:00

..

[torch.compile] Make Query Quantization Fusable (#24914 )

2025-09-25 09:25:12 -04:00

Directly get max encoder len from VLLM config in V1 (#24866 )

2025-09-16 17:52:31 +00:00

[Misc] fix tests failure by using current_platform (#25825 )

2025-09-29 04:18:57 +00:00

[Attention] FlashAttn MLA (#14258 )

2025-09-04 02:47:59 -07:00

__init__.py

[V0 Deprecation] Remove unused classes in attention (#25541 )

2025-09-24 13:24:40 -07:00

layer.py

[BugFix][torch.compile] KV scale calculation issues with FP8 quantization (#25513 )

2025-09-29 15:52:04 -04:00

selector.py

[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 )

2025-09-25 17:37:50 +00:00