vllm/quantization at 58ce8d12b741b5bafe6bd0fb878727baea6171fe - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-23 10:45:01 +08:00

History

Michael Goin f9a4087182

Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431 )

Signed-off-by: mgoin <mgoin64@gmail.com>

2025-11-11 11:46:04 -05:00

..

…

…

[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 )

2025-11-07 04:18:39 -08:00

[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604 )

2025-10-17 08:10:23 -06:00

…

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

…

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

hadamard/hadacore

…

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

…

Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431 )

2025-11-11 11:46:04 -05:00

activation_kernels.cu

[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366 )

2025-11-10 09:21:52 -08:00

utils.cuh

…

vectorization_utils.cuh

…

vectorization.cuh

…