vllm/cudagraph at dc464a3d3937e30267514e1fc5b988a35dd9dbdf - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-27 07:07:52 +08:00

History

[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274 )

Signed-off-by: qqma <qqma@amazon.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: qqma <qqma@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

2025-09-22 10:37:43 -07:00

__init__.py

[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 )

2025-08-15 10:01:39 -04:00

test_cudagraph_dispatch.py

[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 )

2025-08-15 10:01:39 -04:00

test_cudagraph_mode.py

[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274 )

2025-09-22 10:37:43 -07:00