12 Commits

Author SHA1 Message Date
ElizaWszola
a091e2da3e
[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
2024-09-16 09:47:19 -06:00
Dipika Sikka
6cd5e5b07e
[Misc] Fused MoE Marlin support for GPTQ (#8217) 2024-09-09 23:02:52 -04:00
Kyle Sayers
c7cb5c3335
[Misc] GPTQ Activation Ordering (#8135) 2024-09-09 16:27:26 -04:00
Dipika Sikka
2188a60c7e
[Misc] Update GPTQ to use vLLMParameters (#7976) 2024-09-03 17:21:44 -04:00
Dipika Sikka
fc911880cc
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
2024-08-27 15:07:09 -07:00
Dipika Sikka
665304092d
[Misc] Update qqq to use vLLMParameters (#7805) 2024-08-26 13:16:15 -06:00
Dipika Sikka
f1df5dbfd6
[Misc] Update marlin to use vLLMParameters (#7803) 2024-08-23 14:30:52 -04:00
Dipika Sikka
955b5191c9
[Misc] update fp8 to use vLLMParameter (#7437) 2024-08-22 08:36:18 -04:00
Michael Goin
aae74ef95c
Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)" (#7764) 2024-08-22 03:42:14 +00:00
Dipika Sikka
8678a69ab5
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
2024-08-21 16:17:10 -07:00
Dipika Sikka
b1e5afc3e7
[Misc] Update awq and awq_marlin to use vLLMParameters (#7422) 2024-08-13 17:08:20 -04:00
Dipika Sikka
fb377d7e74
[Misc] Update gptq_marlin to use new vLLMParameters (#7281) 2024-08-13 14:30:11 -04:00