vllm/kernels at f70bccac75a0aecc0a5fc934859158a3e1f019a5 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 01:17:19 +08:00

History

Lucas Wilkinson 86e9c8df29

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

2024-09-23 13:46:26 -04:00

benchmark_aqlm.py

[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )

2024-06-20 17:00:13 -06:00

benchmark_layernorm.py

[CI/Build] Avoid CUDA initialization (#8534 )

2024-09-18 10:38:11 +00:00

benchmark_machete.py

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

benchmark_marlin.py

[Misc] Disambiguate quantized types via a new ScalarType (#6396 )

2024-08-02 13:51:58 -07:00

benchmark_moe.py

[CI/Build] Avoid CUDA initialization (#8534 )

2024-09-18 10:38:11 +00:00

benchmark_paged_attention.py

[CI/Build] Avoid CUDA initialization (#8534 )

2024-09-18 10:38:11 +00:00

benchmark_quant.py

[CI/Build] Avoid CUDA initialization (#8534 )

2024-09-18 10:38:11 +00:00

benchmark_rope.py

[CI/Build] Avoid CUDA initialization (#8534 )

2024-09-18 10:38:11 +00:00

benchmark_shapes.py

Add marlin unit tests and marlin benchmark script (#4815 )

2024-05-16 09:36:49 -04:00

graph_machete_bench.py

[CI/Build] Update Ruff version (#8469 )

2024-09-18 11:00:56 +00:00

requirements.txt

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

weight_shapes.py

[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )

2024-08-20 07:09:33 -06:00