Aaron Pham
|
9d104b5beb
|
[CI/Build] Update Ruff version (#8469)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-18 11:00:56 +00:00 |
|
ElizaWszola
|
a091e2da3e
|
[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
|
2024-09-16 09:47:19 -06:00 |
|
Dipika Sikka
|
6cd5e5b07e
|
[Misc] Fused MoE Marlin support for GPTQ (#8217)
|
2024-09-09 23:02:52 -04:00 |
|
Michael Goin
|
2ee45281a5
|
Move verify_marlin_supported to GPTQMarlinLinearMethod (#8165)
|
2024-09-05 11:09:46 -04:00 |
|
Dipika Sikka
|
fb377d7e74
|
[Misc] Update gptq_marlin to use new vLLMParameters (#7281)
|
2024-08-13 14:30:11 -04:00 |
|
Lucas Wilkinson
|
311f743831
|
[Bugfix] Fix gptq failure on T4s (#7264)
|
2024-08-07 20:05:37 +00:00 |
|
Michael Goin
|
f9a5600649
|
[Bugfix] Fix GPTQ and GPTQ Marlin CPU Offloading (#7225)
|
2024-08-06 18:34:26 -07:00 |
|
Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|
Alexander Matveev
|
0310029a2f
|
[Bugfix] Fix awq_marlin and gptq_marlin flags (#6745)
|
2024-07-24 22:34:11 -07:00 |
|
Alexander Matveev
|
396d92d5e0
|
[Kernel][Core] Add AWQ support to the Marlin kernel (#6612)
|
2024-07-21 19:41:42 -04:00 |
|
Robert Shaw
|
683e3cb9c4
|
[ Misc ] fbgemm checkpoints (#6559)
|
2024-07-20 09:36:57 -07:00 |
|
Robert Shaw
|
babf52dade
|
[ Misc ] More Cleanup of Marlin (#6359)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-13 10:21:37 +00:00 |
|
Robert Shaw
|
b675069d74
|
[ Misc ] Refactor Marlin Python Utilities (#6082)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-11 15:40:11 +00:00 |
|
Robert Shaw
|
abfe705a02
|
[ Misc ] Support Fp8 via llm-compressor (#6110)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-07-07 20:42:11 +00:00 |
|
youkaichao
|
482045ee77
|
[hardware][misc] introduce platform abstraction (#6080)
|
2024-07-02 20:12:22 -07:00 |
|
Qubitium-ModelCloud
|
ee93f4f92a
|
[CORE] Quantized lm-head Framework (#4442)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-07-02 22:25:17 +00:00 |
|
youkaichao
|
614aa51203
|
[misc][cuda] use nvml to avoid accidentally cuda initialization (#6007)
|
2024-06-30 20:07:34 -07:00 |
|
Cyrus Leung
|
0e9164b40a
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
Alexander Matveev
|
5bf185a1c4
|
[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter (#5108)
|
2024-05-30 00:30:18 +00:00 |
|
Alexander Matveev
|
6979ade384
|
Add GPTQ Marlin 2:4 sparse structured support (#4790)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-05-16 12:56:15 -04:00 |
|
Jinzhen Lin
|
99caa49106
|
[Kernel] add bfloat16 support for gptq marlin kernel (#4788)
|
2024-05-16 09:55:29 -04:00 |
|
alexm-nm
|
7038e8b803
|
[Kernel] Support running GPTQ 8-bit models in Marlin (#4533)
|
2024-05-02 12:56:22 -04:00 |
|
Kunshang Ji
|
26f2fb5113
|
[Core]Refactor gptq_marlin ops (#4466)
|
2024-04-30 08:14:47 -04:00 |
|
Robert Shaw
|
73c8d677e5
|
[Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (#3922)
Co-authored-by: alexm <alexm@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-04-29 09:35:34 -07:00 |
|