Jinzhen Lin
1e60f87bb3
[Kernel] fix moe_align_block_size error condition ( #12239 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-01-21 10:30:28 -08:00
Jinzhen Lin
750f4cabfa
[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) ( #12222 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-20 16:42:16 -08:00
Harry Mellor
3ea7b94523
Move linting to pre-commit ( #11975 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-20 14:58:01 +08:00
Elfie Guo
0794e7446e
[Misc] Add multipstep chunked-prefill support for FlashInfer ( #10467 )
2025-01-15 12:47:49 +08:00
Jee Jee Li
42f5e7c52a
[Kernel] Support MulAndSilu ( #11624 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-15 02:29:53 +00:00
Wallas Henrique
cfd3219f58
[Hardware][Apple] Native support for macOS Apple Silicon ( #11696 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-08 16:35:49 +08:00
Lu Fang
4068f4b5b5
[MISC] Replace c10::optional with std::optional ( #11730 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-01-05 10:20:34 +09:00
wchen61
5dba257506
Resolve race conditions in Marlin kernel ( #11493 )
...
Signed-off-by: wchen61 <wchen61@foxmail.com>
2025-01-02 22:58:56 +00:00
Tyler Michael Smith
970d6d0776
[Build][Kernel] Update CUTLASS to v3.6.0 ( #11607 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-30 17:22:13 +08:00
Simon Mo
f49777ba62
Deepseek v3 ( #11502 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>
2024-12-26 16:09:44 -08:00
Varun Sundar Rabindranath
8936316d58
[Kernel] Refactor Cutlass c3x ( #10049 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-12-19 07:00:18 +00:00
Tyler Michael Smith
5a9da2e6e9
[Bugfix][Build/CI] Fix sparse CUTLASS compilation on CUDA [12.0, 12.2) ( #11311 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-19 02:43:30 +00:00
Dipika Sikka
60508ffda9
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support ( #10995 )
...
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2024-12-18 09:57:16 -05:00
Luka Govedič
30870b4f66
[torch.compile] Dynamic fp8 + rms_norm fusion ( #10906 )
...
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-12-13 03:19:23 +00:00
Woosuk Kwon
3b61cb450d
[V1] Further reduce CPU overheads in flash-attn ( #10989 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-12-09 12:38:46 -08:00
zhou fan
78029b34ed
[BugFix][Kernel]: fix illegal memory access in causal_conv1d when conv_states is None ( #10928 )
...
Signed-off-by: xffxff <1247714429@qq.com>
2024-12-08 01:21:18 +08:00
Gregory Shtrasberg
f13cf9ad50
[Build] Fix for the Wswitch-bool clang warning ( #10060 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2024-12-07 09:03:44 +00:00
Tyler Michael Smith
e2251109c7
[Kernel] Remove if-else with identical branches in marlin 2:4 ( #10687 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-11-26 22:55:32 -08:00
Sanket Kale
a6760f6456
[Feature] vLLM ARM Enablement for AARCH64 CPUs ( #9228 )
...
Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-11-25 18:32:39 -08:00
kliuae
7c25fe45a6
[AMD] Add support for GGUF quantization on ROCm ( #10254 )
2024-11-22 21:14:49 -08:00
Lucas Wilkinson
d200972e7f
[Bugfix] Marlin 2:4 temp fix for large M dim (>256) ( #10464 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2024-11-19 19:40:33 -08:00
ElizaWszola
b00b33d77e
[Model][Quantization] HQQ support through Marlin kernel expansion ( #9766 )
...
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
2024-11-19 13:31:12 -08:00
Manjul Mohan
1ea291a417
Fix: Build error seen on Power Architecture ( #10421 )
...
Signed-off-by: Manjul Mohan <manjul.mohan@ibm.com>
Signed-off-by: B-201 <Joy25810@foxmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: ismael-dm <ismaeldm99@gmail.com>
Signed-off-by: Andrew Nesbitt <andrewnez@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: yan ma <yan.ma@intel.com>
Signed-off-by: Angus Wang <wangjadehao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: rickyx <rickyx@anyscale.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Manjul Mohan manjul.mohan@ibm.com <manjulmohan@ltcd97-lp2.aus.stglabs.ibm.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: ismael-dm <ismaeldm99@gmail.com>
Co-authored-by: Andrew Nesbitt <andrewnez@gmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Angus Wang <wangjadehao@gmail.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Ricky Xu <rickyx@anyscale.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2024-11-19 09:34:57 -08:00
Lucas Wilkinson
96d999fbe8
[Kernel] Initial Machete W4A8 support + Refactors ( #9855 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2024-11-18 12:59:29 -07:00
Maximilien de Bayser
4a18fd14ba
Support Roberta embedding models ( #9387 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
2024-11-14 21:23:29 +00:00
Pavani Majety
b6dde33019
[Core] Flashinfer - Remove advance step size restriction ( #10282 )
2024-11-13 16:29:32 +08:00
Aleksandr Malyshev
812c981fa0
Splitting attention kernel file ( #10091 )
...
Signed-off-by: maleksan85 <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2024-11-11 22:55:07 -08:00
Luka Govedič
4f93dfe952
[torch.compile] Fuse RMSNorm with quant ( #9138 )
...
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-11-08 21:20:08 +00:00
Li, Jiang
a6f332d0d9
[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target ( #10108 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-11-07 18:42:50 +08:00
Hanzhi Zhou
6192e9b8fe
[Core][Distributed] Refactor ipc buffer init in CustomAllreduce ( #10030 )
...
Signed-off-by: Hanzhi Zhou <hanzhi713@gmail.com>
2024-11-06 23:50:47 -08:00
Li, Jiang
a4b3e0c1e9
[Hardware][CPU] Update torch 2.5 ( #9911 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-11-07 04:43:08 +00:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL ( #8464 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
Mor Zusman
9fb12f7848
[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 ( #9838 )
...
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-10-31 20:06:25 +00:00
youkaichao
8549c82660
[core] cudagraph output with tensor weak reference ( #9724 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-27 00:19:28 -07:00
Charlie Fu
59449095ab
[Performance][Kernel] Fused_moe Performance Improvement ( #9384 )
...
Signed-off-by: charlifu <charlifu@amd.com>
2024-10-24 15:37:52 -07:00
Jee Jee Li
295a061fb3
[Kernel] add kernel for FATReLU ( #9610 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-10-24 16:18:27 +08:00
Lucas Wilkinson
d1e8240875
[Bugfix] Fix spurious "No compiled cutlass_scaled_mm ..." for W8A8 on Turing ( #9487 )
2024-10-22 15:41:13 -07:00
bnellnm
eca2c5f7c0
[Bugfix] Fix support for dimension like integers and ScalarType ( #9299 )
2024-10-17 19:08:34 +00:00
Li, Jiang
5eda21e773
[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support ( #9344 )
2024-10-17 12:21:04 -04:00
rasmith
92d86da217
[BugFix] [Kernel] Fix GPU SEGV occurring in int8 kernels ( #9391 )
2024-10-17 01:34:06 +00:00
Tyler Michael Smith
c3fab5f769
[Bugfix][Kernel] Prevent integer overflow in fp8 dynamic per-token quantize kernel ( #9425 )
2024-10-16 23:46:06 +00:00
Mor Zusman
fb60ae9b91
[Kernel][Model] Improve continuous batching for Jamba and Mamba ( #9189 )
2024-10-16 12:12:43 -04:00
Lucas Wilkinson
18511aeda6
[Bugfix] Fix Machete unittests failing with NotImplementedError ( #9218 )
2024-10-10 17:39:56 +00:00
Lucas Wilkinson
a64e7b9407
[Bugfix] Machete garbage results for some models (large K dim) ( #9212 )
2024-10-10 14:16:17 +08:00
ElizaWszola
05d686432f
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE ( #8973 )
...
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
2024-10-04 12:34:44 -06:00
Lucas Wilkinson
aeb37c2a72
[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) ( #8845 )
2024-10-03 22:55:25 -04:00
Varun Sundar Rabindranath
afb050b29d
[Core] CUDA Graphs for Multi-Step + Chunked-Prefill ( #8645 )
...
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-10-02 19:44:39 +00:00
Kevin H. Luu
aaccca2b4d
[CI/Build] Fix machete generated kernel files ordering ( #8976 )
...
Signed-off-by: kevin <kevin@anyscale.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-10-01 03:33:12 +00:00
Mor Zusman
f13a07b1f8
[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model ( #8533 )
2024-09-29 17:35:58 -04:00
ElizaWszola
d081da0064
[Bugfix] Fix Marlin MoE act order when is_k_full == False ( #8741 )
...
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-09-28 18:19:40 -07:00