ElizaWszola
|
b00b33d77e
|
[Model][Quantization] HQQ support through Marlin kernel expansion (#9766)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-11-19 13:31:12 -08:00 |
|
Russell Bryant
|
3be5b26a76
|
[CI/Build] Add shell script linting using shellcheck (#7925)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-07 18:17:29 +00:00 |
|
Michael Goin
|
ce00231a8b
|
[Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213)
|
2024-10-10 14:15:40 +08:00 |
|
ElizaWszola
|
05d686432f
|
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
|
2024-10-04 12:34:44 -06:00 |
|
Michael Goin
|
873edda6cf
|
[Misc] Support FP8 MoE for compressed-tensors (#8588)
|
2024-09-25 09:43:36 -07:00 |
|
ElizaWszola
|
a091e2da3e
|
[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
|
2024-09-16 09:47:19 -06:00 |
|
Dipika Sikka
|
6cd5e5b07e
|
[Misc] Fused MoE Marlin support for GPTQ (#8217)
|
2024-09-09 23:02:52 -04:00 |
|
Kyle Sayers
|
c7cb5c3335
|
[Misc] GPTQ Activation Ordering (#8135)
|
2024-09-09 16:27:26 -04:00 |
|
Dipika Sikka
|
2188a60c7e
|
[Misc] Update GPTQ to use vLLMParameters (#7976)
|
2024-09-03 17:21:44 -04:00 |
|
Dipika Sikka
|
fc911880cc
|
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-08-27 15:07:09 -07:00 |
|
Dipika Sikka
|
665304092d
|
[Misc] Update qqq to use vLLMParameters (#7805)
|
2024-08-26 13:16:15 -06:00 |
|
Dipika Sikka
|
f1df5dbfd6
|
[Misc] Update marlin to use vLLMParameters (#7803)
|
2024-08-23 14:30:52 -04:00 |
|
Dipika Sikka
|
955b5191c9
|
[Misc] update fp8 to use vLLMParameter (#7437)
|
2024-08-22 08:36:18 -04:00 |
|
Michael Goin
|
aae74ef95c
|
Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)" (#7764)
|
2024-08-22 03:42:14 +00:00 |
|
Dipika Sikka
|
8678a69ab5
|
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-08-21 16:17:10 -07:00 |
|
Dipika Sikka
|
b1e5afc3e7
|
[Misc] Update awq and awq_marlin to use vLLMParameters (#7422)
|
2024-08-13 17:08:20 -04:00 |
|
Dipika Sikka
|
fb377d7e74
|
[Misc] Update gptq_marlin to use new vLLMParameters (#7281)
|
2024-08-13 14:30:11 -04:00 |
|