xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-16 01:27:02 +08:00

Author	SHA1	Message	Date
Lucas Wilkinson	aeb37c2a72	[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845 )	2024-10-03 22:55:25 -04:00
ElizaWszola	d081da0064	[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741 ) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-09-28 18:19:40 -07:00
bnellnm	c166e7e43e	[Bugfix] Allow ScalarType to be compiled with pytorch 2.3 and add checks for registering FakeScalarType and dynamo support. (#7886 )	2024-08-27 23:13:45 -04:00
bnellnm	7759ae958f	[Kernel][Misc] dynamo support for ScalarType (#7594 )	2024-08-16 13:59:49 -07:00
Lucas Wilkinson	6aa33cb2dd	[Misc] Use scalar type to dispatch to different `gptq_marlin` kernels (#7323 )	2024-08-12 14:40:13 -04:00
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00