xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-12 19:35:01 +08:00

Author	SHA1	Message	Date
Aaron Pham	9d104b5beb	[CI/Build] Update Ruff version (#8469 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-18 11:00:56 +00:00
ElizaWszola	a091e2da3e	[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032 ) Co-authored-by: Dipika <dipikasikka1@gmail.com>	2024-09-16 09:47:19 -06:00
Dipika Sikka	6cd5e5b07e	[Misc] Fused MoE Marlin support for GPTQ (#8217 )	2024-09-09 23:02:52 -04:00
Michael Goin	2ee45281a5	Move verify_marlin_supported to GPTQMarlinLinearMethod (#8165 )	2024-09-05 11:09:46 -04:00
Dipika Sikka	fb377d7e74	[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281 )	2024-08-13 14:30:11 -04:00
Lucas Wilkinson	311f743831	[Bugfix] Fix gptq failure on T4s (#7264 )	2024-08-07 20:05:37 +00:00
Michael Goin	f9a5600649	[Bugfix] Fix GPTQ and GPTQ Marlin CPU Offloading (#7225 )	2024-08-06 18:34:26 -07:00
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
Alexander Matveev	0310029a2f	[Bugfix] Fix awq_marlin and gptq_marlin flags (#6745 )	2024-07-24 22:34:11 -07:00
Alexander Matveev	396d92d5e0	[Kernel][Core] Add AWQ support to the Marlin kernel (#6612 )	2024-07-21 19:41:42 -04:00
Robert Shaw	683e3cb9c4	[ Misc ] `fbgemm` checkpoints (#6559 )	2024-07-20 09:36:57 -07:00
Robert Shaw	babf52dade	[ Misc ] More Cleanup of Marlin (#6359 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-07-13 10:21:37 +00:00
Robert Shaw	b675069d74	[ Misc ] Refactor Marlin Python Utilities (#6082 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-07-11 15:40:11 +00:00
Robert Shaw	abfe705a02	[ Misc ] Support Fp8 via `llm-compressor` (#6110 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>	2024-07-07 20:42:11 +00:00
youkaichao	482045ee77	[hardware][misc] introduce platform abstraction (#6080 )	2024-07-02 20:12:22 -07:00
Qubitium-ModelCloud	ee93f4f92a	[CORE] Quantized lm-head Framework (#4442 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: ZX <zx@lbx.dev>	2024-07-02 22:25:17 +00:00
youkaichao	614aa51203	[misc][cuda] use nvml to avoid accidentally cuda initialization (#6007 )	2024-06-30 20:07:34 -07:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
Alexander Matveev	5bf185a1c4	[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter (#5108 )	2024-05-30 00:30:18 +00:00
Alexander Matveev	6979ade384	Add GPTQ Marlin 2:4 sparse structured support (#4790 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-05-16 12:56:15 -04:00
Jinzhen Lin	99caa49106	[Kernel] add bfloat16 support for gptq marlin kernel (#4788 )	2024-05-16 09:55:29 -04:00
alexm-nm	7038e8b803	[Kernel] Support running GPTQ 8-bit models in Marlin (#4533 )	2024-05-02 12:56:22 -04:00
Kunshang Ji	26f2fb5113	[Core]Refactor gptq_marlin ops (#4466 )	2024-04-30 08:14:47 -04:00
Robert Shaw	73c8d677e5	[Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (#3922 ) Co-authored-by: alexm <alexm@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-04-29 09:35:34 -07:00

24 Commits