vllm/quantization at 804e3468c04b1a43c0019d2835dabc74b779c1fc - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-20 05:36:01 +08:00

History

czhu-cohere f6227c22ab

[Kernel]Support W4A8 Grouped GEMM on Hopper (#29691 )

Signed-off-by: czhu-cohere <conway.zhu@cohere.com>

2025-12-08 19:29:06 -08:00

..

…

[Kernel]Support W4A8 Grouped GEMM on Hopper (#29691 )

2025-12-08 19:29:06 -08:00

SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm (#29711 )

2025-12-01 17:24:18 -08:00

[Performance] Fused blockwise quant RMS norm (#27883 )

2025-12-07 16:38:04 +00:00

…

…

[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 )

2025-11-29 07:19:33 -08:00

[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 )

2025-11-29 07:19:33 -08:00

hadamard/hadacore

Use narrow over indexing in hadacore_transform to prep for ABI stable (#28756 )

2025-11-15 01:10:15 -08:00

…

…

[Kernel]Support W4A8 Grouped GEMM on Hopper (#29691 )

2025-12-08 19:29:06 -08:00

activation_kernels.cu

[Performance][B200] silu_mul_quant: pack scales in int32 (#28358 )

2025-11-13 10:16:55 -08:00

utils.cuh

…

vectorization_utils.cuh

…

vectorization.cuh

…