xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-21 01:37:01 +08:00

Author	SHA1	Message	Date
Dipika Sikka	60508ffda9	[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995 ) Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com> Co-authored-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2024-12-18 09:57:16 -05:00
Luka Govedič	30870b4f66	[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 ) Signed-off-by: luka <luka@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-13 03:19:23 +00:00
Woosuk Kwon	3b61cb450d	[V1] Further reduce CPU overheads in flash-attn (#10989 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-09 12:38:46 -08:00
zhou fan	78029b34ed	[BugFix][Kernel]: fix illegal memory access in causal_conv1d when conv_states is None (#10928 ) Signed-off-by: xffxff <1247714429@qq.com>	2024-12-08 01:21:18 +08:00
Gregory Shtrasberg	f13cf9ad50	[Build] Fix for the Wswitch-bool clang warning (#10060 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2024-12-07 09:03:44 +00:00
Tyler Michael Smith	e2251109c7	[Kernel] Remove if-else with identical branches in marlin 2:4 (#10687 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-11-26 22:55:32 -08:00
Sanket Kale	a6760f6456	[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228 ) Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-11-25 18:32:39 -08:00
kliuae	7c25fe45a6	[AMD] Add support for GGUF quantization on ROCm (#10254 )	2024-11-22 21:14:49 -08:00
Lucas Wilkinson	d200972e7f	[Bugfix] Marlin 2:4 temp fix for large M dim (>256) (#10464 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2024-11-19 19:40:33 -08:00
ElizaWszola	b00b33d77e	[Model][Quantization] HQQ support through Marlin kernel expansion (#9766 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com>	2024-11-19 13:31:12 -08:00
Manjul Mohan	1ea291a417	Fix: Build error seen on Power Architecture (#10421 ) Signed-off-by: Manjul Mohan <manjul.mohan@ibm.com> Signed-off-by: B-201 <Joy25810@foxmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: ismael-dm <ismaeldm99@gmail.com> Signed-off-by: Andrew Nesbitt <andrewnez@gmail.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: yan ma <yan.ma@intel.com> Signed-off-by: Angus Wang <wangjadehao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: rickyx <rickyx@anyscale.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Manjul Mohan manjul.mohan@ibm.com <manjulmohan@ltcd97-lp2.aus.stglabs.ibm.com> Co-authored-by: B-201 <Joy25810@foxmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: ismael-dm <ismaeldm99@gmail.com> Co-authored-by: Andrew Nesbitt <andrewnez@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Angus Wang <wangjadehao@gmail.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Ricky Xu <rickyx@anyscale.com> Co-authored-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2024-11-19 09:34:57 -08:00
Lucas Wilkinson	96d999fbe8	[Kernel] Initial Machete W4A8 support + Refactors (#9855 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2024-11-18 12:59:29 -07:00
Maximilien de Bayser	4a18fd14ba	Support Roberta embedding models (#9387 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-14 21:23:29 +00:00
Pavani Majety	b6dde33019	[Core] Flashinfer - Remove advance step size restriction (#10282 )	2024-11-13 16:29:32 +08:00
Aleksandr Malyshev	812c981fa0	Splitting attention kernel file (#10091 ) Signed-off-by: maleksan85 <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2024-11-11 22:55:07 -08:00
Luka Govedič	4f93dfe952	[torch.compile] Fuse RMSNorm with quant (#9138 ) Signed-off-by: luka <luka@neuralmagic.com> Co-authored-by: youkaichao <youkaichao@126.com>	2024-11-08 21:20:08 +00:00
Li, Jiang	a6f332d0d9	[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target (#10108 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-07 18:42:50 +08:00
Hanzhi Zhou	6192e9b8fe	[Core][Distributed] Refactor ipc buffer init in CustomAllreduce (#10030 ) Signed-off-by: Hanzhi Zhou <hanzhi713@gmail.com>	2024-11-06 23:50:47 -08:00
Li, Jiang	a4b3e0c1e9	[Hardware][CPU] Update torch 2.5 (#9911 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-07 04:43:08 +00:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
Mor Zusman	9fb12f7848	[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 (#9838 ) Signed-off-by: mzusman <mor.zusmann@gmail.com>	2024-10-31 20:06:25 +00:00
youkaichao	8549c82660	[core] cudagraph output with tensor weak reference (#9724 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-27 00:19:28 -07:00
Charlie Fu	59449095ab	[Performance][Kernel] Fused_moe Performance Improvement (#9384 ) Signed-off-by: charlifu <charlifu@amd.com>	2024-10-24 15:37:52 -07:00
Jee Jee Li	295a061fb3	[Kernel] add kernel for FATReLU (#9610 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-10-24 16:18:27 +08:00
Lucas Wilkinson	d1e8240875	[Bugfix] Fix spurious "No compiled cutlass_scaled_mm ..." for W8A8 on Turing (#9487 )	2024-10-22 15:41:13 -07:00
bnellnm	eca2c5f7c0	[Bugfix] Fix support for dimension like integers and ScalarType (#9299 )	2024-10-17 19:08:34 +00:00
Li, Jiang	5eda21e773	[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344 )	2024-10-17 12:21:04 -04:00
rasmith	92d86da217	[BugFix] [Kernel] Fix GPU SEGV occurring in int8 kernels (#9391 )	2024-10-17 01:34:06 +00:00
Tyler Michael Smith	c3fab5f769	[Bugfix][Kernel] Prevent integer overflow in fp8 dynamic per-token quantize kernel (#9425 )	2024-10-16 23:46:06 +00:00
Mor Zusman	fb60ae9b91	[Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189 )	2024-10-16 12:12:43 -04:00
Lucas Wilkinson	18511aeda6	[Bugfix] Fix Machete unittests failing with `NotImplementedError` (#9218 )	2024-10-10 17:39:56 +00:00
Lucas Wilkinson	a64e7b9407	[Bugfix] Machete garbage results for some models (large K dim) (#9212 )	2024-10-10 14:16:17 +08:00
ElizaWszola	05d686432f	[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973 ) Co-authored-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu>	2024-10-04 12:34:44 -06:00
Lucas Wilkinson	aeb37c2a72	[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845 )	2024-10-03 22:55:25 -04:00
Varun Sundar Rabindranath	afb050b29d	[Core] CUDA Graphs for Multi-Step + Chunked-Prefill (#8645 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-10-02 19:44:39 +00:00
Kevin H. Luu	aaccca2b4d	[CI/Build] Fix machete generated kernel files ordering (#8976 ) Signed-off-by: kevin <kevin@anyscale.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-10-01 03:33:12 +00:00
Mor Zusman	f13a07b1f8	[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533 )	2024-09-29 17:35:58 -04:00
ElizaWszola	d081da0064	[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741 ) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-09-28 18:19:40 -07:00
Varun Sundar Rabindranath	c2ec430ab5	[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-09-27 13:32:07 -07:00
Tyler Michael Smith	71d21c73ab	[Bugfix] Fixup advance_step.cu warning (#8815 )	2024-09-26 16:23:45 -07:00
bnellnm	300da09177	[Kernel] Fullgraph and opcheck tests (#8479 )	2024-09-25 08:35:52 -06:00
sasha0552	b4522474a3	[Bugfix][Kernel] Implement acquire/release polyfill for Pascal (#8776 )	2024-09-24 21:26:33 -07:00
ElizaWszola	a928ded995	[Kernel] Split Marlin MoE kernels into multiple files (#8661 ) Co-authored-by: mgoin <michael@neuralmagic.com>	2024-09-24 09:31:42 -07:00
Hanzhi Zhou	cc4325b66a	[Bugfix] Fix potentially unsafe custom allreduce synchronization (#8558 )	2024-09-24 01:08:14 -07:00
Lucas Wilkinson	86e9c8df29	[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 ) Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-09-23 13:46:26 -04:00
Tyler Michael Smith	d66ac62854	[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643 )	2024-09-21 23:45:02 +00:00
Charlie Fu	9cc373f390	[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (#8577 )	2024-09-19 17:37:57 +00:00
Tyler Michael Smith	4c34ce8916	[Kernel] Remove marlin moe templating on thread_m_blocks (#8573 ) Co-authored-by: lwilkinson@neuralmagic.com	2024-09-19 01:42:49 +00:00
Tyler Michael Smith	8110e44529	[Kernel] Change interface to Mamba causal_conv1d_update for continuous batching (#8012 )	2024-09-17 23:44:27 +00:00
youkaichao	99aa4eddaf	[torch.compile] register allreduce operations as custom ops (#8526 )	2024-09-16 22:57:57 -07:00

1 2 3 4 5 ...

262 Commits