vllm/kernels at cf73f0c95e09836efff876d5bfd9b9c6cc1ba06e - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 19:17:25 +08:00

History

Wallas Henrique c27df94e1f

[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850 )

Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

2024-11-25 12:23:32 -05:00

__init__.py

[CI/Build] Move test_utils.py to tests/utils.py (#4425 )

2024-05-13 23:50:09 +09:00

allclose_default.py

[ROCm] Fix some kernels failed unit tests (#2498 )

2024-02-05 14:25:36 -08:00

conftest.py

[Kernel] Use flashinfer for decoding (#4353 )

2024-05-03 15:51:27 -07:00

quant_utils.py

[Hardware][ROCM] using current_platform.is_rocm (#9642 )

2024-10-28 04:07:00 +00:00

test_activation.py

[CI] Prune back the number of tests in tests/kernels/* (#9932 )

2024-11-05 16:02:32 -05:00

test_aqlm.py

[Kernel] Fullgraph and opcheck tests (#8479 )

2024-09-25 08:35:52 -06:00

test_attention_selector.py

[Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358 )

2024-11-19 11:22:26 +08:00

test_attention.py

[CI] Prune back the number of tests in tests/kernels/* (#9932 )

2024-11-05 16:02:32 -05:00

test_awq_marlin.py

[CI] Prune back the number of tests in tests/kernels/* (#9932 )

2024-11-05 16:02:32 -05:00

test_awq_triton.py

[Hardware] using current_platform.seed_everything (#9785 )

2024-10-29 14:47:44 +00:00

test_awq.py

[Bugfix] Try to handle older versions of pytorch (#9086 )

2024-10-08 14:28:12 -07:00

test_blocksparse_attention.py

[CI] Prune back the number of tests in tests/kernels/* (#9932 )

2024-11-05 16:02:32 -05:00

test_cache.py

[CI] Prune back the number of tests in tests/kernels/* (#9932 )

2024-11-05 16:02:32 -05:00

test_causal_conv1d.py

[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 (#9838 )

2024-10-31 20:06:25 +00:00

test_cutlass.py

[CI] Prune back the number of tests in tests/kernels/* (#9932 )

2024-11-05 16:02:32 -05:00

test_encoder_decoder_attn.py

[misc] move functions to config.py (#10624 )

2024-11-25 09:27:30 +00:00

test_flash_attn.py

[Hardware] using current_platform.seed_everything (#9785 )

2024-10-29 14:47:44 +00:00

test_flashinfer.py

[Hardware] using current_platform.seed_everything (#9785 )

2024-10-29 14:47:44 +00:00

test_fp8_quant.py

[Hardware] using current_platform.seed_everything (#9785 )

2024-10-29 14:47:44 +00:00

test_ggml.py

[Kernel] Fullgraph and opcheck tests (#8479 )

2024-09-25 08:35:52 -06:00

test_gguf.py

[Hardware] using current_platform.seed_everything (#9785 )

2024-10-29 14:47:44 +00:00

test_gptq.py

[Kernel] Fullgraph and opcheck tests (#8479 )

2024-09-25 08:35:52 -06:00

test_int8_quant.py

[bugfix] Fix static asymmetric quantization case (#10334 )

2024-11-15 09:35:11 +08:00

test_layernorm.py

[torch.compile] Fuse RMSNorm with quant (#9138 )

2024-11-08 21:20:08 +00:00

test_machete_mm.py

[Kernel] Initial Machete W4A8 support + Refactors (#9855 )

2024-11-18 12:59:29 -07:00

test_mamba_ssm.py

[CI/Build] drop support for Python 3.8 EOL (#8464 )

2024-11-06 07:11:55 +00:00

test_marlin_gemm.py

[Bugfix] Marlin 2:4 temp fix for large M dim (>256) (#10464 )

2024-11-19 19:40:33 -08:00

test_moe.py

[Misc] Bump up test_fused_moe tolerance (#10364 )

2024-11-15 16:31:18 +00:00

test_permute_cols.py

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

test_pos_encoding.py

[CI] Prune back the number of tests in tests/kernels/* (#9932 )

2024-11-05 16:02:32 -05:00

test_prefix_prefill.py

[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850 )

2024-11-25 12:23:32 -05:00

test_rotary_embedding.py

[Kernel] Fullgraph and opcheck tests (#8479 )

2024-09-25 08:35:52 -06:00

test_triton_scaled_mm.py

[Kernel][Triton] Add Triton implementation for scaled_mm_triton to support fp8 and int8 SmoothQuant, symmetric case (#9857 )

2024-11-08 19:59:22 -05:00

test_utils.py

[Kernel] Fullgraph and opcheck tests (#8479 )

2024-09-25 08:35:52 -06:00

utils.py

[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models (#9559 )

2024-11-01 23:22:49 -07:00