Tyler Michael Smith
1236aebf0e
Merge remote-tracking branch 'origin/main' into fp8_ep_dp
2025-06-02 14:53:27 -04:00
vllmellm
0f5e0d567e
[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 ( #18825 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-05-31 03:39:31 -07:00
Bill Nell
e69879996f
re-enable cudagraph+torch.compile
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-30 00:12:54 +00:00
Bill Nell
922165cba3
fp8 + pplx tests + fixes
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-29 21:25:33 +00:00
Bill Nell
12ea698498
pplx + fp8 test
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-29 18:50:37 +00:00
Bill Nell
3ca8322b74
lint
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:56 +00:00
Bill Nell
03b41b6cad
fix merge
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:56 +00:00
Bill Nell
468d16654a
cleanup quantization
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:53 +00:00
Bill Nell
909f234faa
stuff
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:27 +00:00
Bill Nell
f8510587c2
tests + fix
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:27 +00:00
Bill Nell
9cfebf51ba
basic working test
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:27 +00:00
Bill Nell
77f95b99a6
test
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:27 +00:00
Bill Nell
bbe888d033
wip
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:27 +00:00
Bill Nell
25ed6738d4
wip
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:27 +00:00
Bill Nell
e568e401da
fp8 support
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-28 23:40:27 +00:00
Tyler Michael Smith
6e588da0f4
[Build/CI] Fix CUDA 11.8 build ( #17679 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-22 12:13:54 -07:00
bnellnm
c6c10ca920
[Bugfix] Reduce moe_sum test size to avoid OOM ( #18484 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-21 06:46:39 -07:00
bnellnm
92247c522e
[Bug] Fix moe_sum signature ( #18440 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-20 22:37:08 -07:00
bnellnm
f9c069c85e
Modularize fused experts and integrate PPLX kernels ( #15956 )
2025-05-14 13:11:54 -07:00
TJian
612c2edb4f
[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support ( #17110 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-05-14 03:03:11 -07:00
vllmellm
2d912fb66f
[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 ( #17955 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-05-13 22:03:47 -07:00
Jinzhen Lin
d74e5f37bc
[Kernel] fp4 marlin kernel ( #17687 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-05-10 19:58:49 -07:00
Pavani Majety
0c0fdae84f
[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model ( #16362 )
2025-05-09 16:24:41 -07:00
Cyrus Leung
6e5595ca39
[CI/Build] Automatically retry flaky tests ( #17856 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-09 09:55:17 -06:00
Jinzhen Lin
1d0c9d6b2d
[Kernel] some optimizations for dense marlin and moe marlin ( #16850 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-05-05 09:39:30 -07:00
Caleb_Du
3e887d2e0c
permute/unpermute kernel for moe optimization ( #14568 )
...
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
2025-05-02 11:31:55 -07:00
Michael Goin
6317a5174a
Categorize tests/kernels/ based on kernel type ( #16799 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-23 09:21:07 -04:00