wenyujin333
bd43973522
[Kernel] Tune Qwen2MoE kernel configurations with tp2,4 (#5497)
Tune Qwen2-57B-A14B configs based on #4921
Throughput Performance
command: python benchmarks/benchmark_throughput.py --model=Qwen/Qwen2-57B-A14B-Instruct --input-len 1000 --output-len 50 -tp 2
A100 GPU
benchmark no config w/ PR
tp=2 10.53 requests/s, 11058.17 tokens/s 12.47 requests/s, 13088.57 tokens/s
tp=4 17.77 requests/s, 18662.95 tokens/s 20.20 requests/s, 21212.32 tokens/s
2024-06-13 09:01:10 -07:00
..
2024-06-12 17:27:08 -07:00
2024-06-12 21:59:44 +00:00
2024-06-12 17:27:08 -07:00
2024-06-12 11:53:03 -07:00
2024-06-12 18:42:12 -04:00
2024-06-12 11:53:03 -07:00
2024-05-01 17:34:40 -07:00
2024-06-10 19:38:49 +08:00
2024-06-13 09:01:10 -07:00
2024-06-10 15:38:47 +00:00
2024-06-11 10:29:02 +08:00
2024-06-11 10:42:26 -07:00
2024-05-17 00:42:41 +09:00
2024-06-12 17:27:08 -07:00
2024-06-10 15:56:06 -07:00
2024-06-12 21:46:35 +00:00
2024-03-02 00:50:01 -08:00
2024-06-12 11:53:03 -07:00
2024-06-12 11:53:03 -07:00
2024-06-12 14:08:52 -07:00
2024-05-24 23:49:49 -07:00
2024-05-28 13:29:31 -07:00
2024-05-11 11:30:37 -07:00
2023-10-30 14:50:47 -07:00
2024-05-23 22:04:24 +00:00
2024-06-02 22:56:41 -07:00
2024-06-12 11:53:03 -07:00