Dipika Sikka
|
60508ffda9
|
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-18 09:57:16 -05:00 |
|
bnellnm
|
eca2c5f7c0
|
[Bugfix] Fix support for dimension like integers and ScalarType (#9299)
|
2024-10-17 19:08:34 +00:00 |
|
Lucas Wilkinson
|
aeb37c2a72
|
[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845)
|
2024-10-03 22:55:25 -04:00 |
|
ElizaWszola
|
d081da0064
|
[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-09-28 18:19:40 -07:00 |
|
bnellnm
|
c166e7e43e
|
[Bugfix] Allow ScalarType to be compiled with pytorch 2.3 and add checks for registering FakeScalarType and dynamo support. (#7886)
|
2024-08-27 23:13:45 -04:00 |
|
bnellnm
|
7759ae958f
|
[Kernel][Misc] dynamo support for ScalarType (#7594)
|
2024-08-16 13:59:49 -07:00 |
|
Lucas Wilkinson
|
6aa33cb2dd
|
[Misc] Use scalar type to dispatch to different gptq_marlin kernels (#7323)
|
2024-08-12 14:40:13 -04:00 |
|
Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|