vllm/attention at e171e5bb67333cd3df873a9d04bab70622755609 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-24 12:57:57 +08:00

History

jvlunteren 01a583fea4

[Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel (#21197 )

Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>

2025-09-18 14:27:01 +00:00

..

[Bug] Fix is_flashmla_supported Check Error (#24774 )

2025-09-15 20:10:55 -06:00

Directly get max encoder len from VLLM config in V1 (#24866 )

2025-09-16 17:52:31 +00:00

[Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel (#21197 )

2025-09-18 14:27:01 +00:00

[Attention] FlashAttn MLA (#14258 )

2025-09-04 02:47:59 -07:00

__init__.py

Remove duplicate entry in vllm.attention.__all__ (#23296 )

2025-08-20 17:14:59 -07:00

layer.py

[XPU] Whisper model support on XPU Platform (#25123 )

2025-09-18 04:30:10 +00:00

selector.py

[gpt-oss] Enable gpt-oss on ampere (#22714 )

2025-08-12 03:21:44 -07:00