vllm/attention at e8db44f8834161259b25ebb13300bd502d37af9f - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-29 01:07:17 +08:00

History

Michael Goin 78237e43bf

[Bugfix] Remove contiguous output req for context parallel MLA (#25414 )

Signed-off-by: Michael Goin <mgoin64@gmail.com>

2025-09-22 20:26:32 -07:00

..

[Misc] Remove unused encoder-decoder error strings (#25374 )

2025-09-22 11:04:32 +00:00

Directly get max encoder len from VLLM config in V1 (#24866 )

2025-09-16 17:52:31 +00:00

[Bugfix] Remove contiguous output req for context parallel MLA (#25414 )

2025-09-22 20:26:32 -07:00

[Attention] FlashAttn MLA (#14258 )

2025-09-04 02:47:59 -07:00

__init__.py

Remove duplicate entry in vllm.attention.__all__ (#23296 )

2025-08-20 17:14:59 -07:00

layer.py

[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (#25298 )

2025-09-20 04:41:23 +00:00

selector.py

[gpt-oss] Enable gpt-oss on ampere (#22714 )

2025-08-12 03:21:44 -07:00