vllm/attention at e5cab71531360345e5b30b98dfcfec8087d6cddf - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-08 04:18:43 +08:00

History

Elfie Guo e39ebf5cf5

[Core/Bugfix] Add query dtype as per FlashInfer API requirements. (#8173 )

2024-09-05 05:12:26 +00:00

..

[Core/Bugfix] Add query dtype as per FlashInfer API requirements. (#8173 )

2024-09-05 05:12:26 +00:00

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208 )

2024-08-12 22:47:41 +00:00

__init__.py

[Core] Add AttentionState abstraction (#7663 )

2024-08-20 18:50:45 +00:00

layer.py

[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )

2024-08-06 16:51:47 -04:00

selector.py

[Core][Kernels] Enable FP8 KV Cache with Flashinfer backend. + BugFix for kv_cache_dtype=auto (#7985 )

2024-08-29 14:53:11 -04:00