Divakar Verma
|
1c1bb0bbf2
|
[Misc][MoE] add Deepseek-V3 moe tuning support (#12558)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-01-30 00:47:30 +00:00 |
|
Harry Mellor
|
823ab79633
|
Update pre-commit hooks (#12475)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-27 17:23:08 -07:00 |
|
Junichi Sato
|
3bb8e2c9a2
|
[Misc] Enable proxy support in benchmark script (#12356)
Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>
|
2025-01-24 14:58:26 +00:00 |
|
Roger Wang
|
3c818bdb42
|
[Misc] Use VisionArena Dataset for VLM Benchmarking (#12389)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-24 00:22:04 -08:00 |
|
Junichi Sato
|
9726ad676d
|
[Misc] Fix OpenAI API Compatibility Issues in Benchmark Script (#12357)
Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>
|
2025-01-23 17:02:13 -05:00 |
|
Gregory Shtrasberg
|
e97f802b2d
|
[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2025-01-23 18:04:03 +00:00 |
|
Nick Hill
|
222a9dc350
|
[Benchmark] More accurate TPOT calc in benchmark_serving.py (#12288)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-01-22 13:46:14 +08:00 |
|
Divakar Verma
|
2acba47d9b
|
[bugfix] moe tuning. rm is_navi() (#12273)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-01-21 22:47:32 +00:00 |
|
gujing
|
936db119ed
|
benchmark_serving support --served-model-name param (#12109)
Signed-off-by: zibai <zibai.gj@alibaba-inc.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-01-19 09:59:56 +00:00 |
|
Divakar Verma
|
8027a72461
|
[ROCm][MoE] moe tuning support for rocm (#12049)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-01-17 14:49:16 +08:00 |
|
Varun Sundar Rabindranath
|
5fd24ec02e
|
[misc] Add LoRA kernel micro benchmarks (#11579)
|
2025-01-16 15:51:40 +00:00 |
|
elijah
|
c6db21313c
|
bugfix: Fix signature mismatch in benchmark's get_tokenizer function (#11982)
Signed-off-by: elijah <f1renze.142857@gmail.com>
|
2025-01-13 15:22:07 +00:00 |
|
minmin
|
8a579408f3
|
[Misc] Update benchmark_prefix_caching.py fixed example usage (#11920)
Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn>
Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn>
|
2025-01-10 20:39:22 +00:00 |
|
Kuntai Du
|
5959564f94
|
Doc fix in benchmark_long_document_qa_throughput.py (#11933)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-01-10 23:51:43 +08:00 |
|
Ye (Charlotte) Qi
|
1d967acb45
|
[Bugfix] fix beam search input errors and latency benchmark script (#11875)
Signed-off-by: Ye Qi <yeq@meta.com>
Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com>
|
2025-01-09 17:36:39 +08:00 |
|
Divakar Verma
|
4d29e91be8
|
[Misc] sort torch profiler table by kernel timing (#11813)
|
2025-01-08 10:57:04 +08:00 |
|
Yihua Cheng
|
0c6f998554
|
[Benchmark] Add benchmark script for CPU offloading (#11533)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-01-01 00:10:55 +00:00 |
|
Jiaxin Shan
|
fc601665eb
|
[Misc] Update disaggregation benchmark scripts and test logs (#11456)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
|
2024-12-25 06:58:48 +00:00 |
|
Varun Sundar Rabindranath
|
98356735ac
|
[misc] benchmark_throughput : Add LoRA (#11267)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-19 15:43:16 +08:00 |
|
Dipika Sikka
|
60508ffda9
|
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-18 09:57:16 -05:00 |
|
Roger Wang
|
02222a0256
|
[Misc] Kernel Benchmark for RMSNorm (#11241)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Xiaoyu Zhang <BBuf@users.noreply.github.com>
|
2024-12-17 06:57:02 +00:00 |
|
Alexander Matveev
|
238c0d93b4
|
[Misc] Add tokenizer_mode param to benchmark_serving.py (#11174)
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
|
2024-12-13 16:19:10 +00:00 |
|
Luka Govedič
|
30870b4f66
|
[torch.compile] Dynamic fp8 + rms_norm fusion (#10906)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-13 03:19:23 +00:00 |
|
Chendi.Xue
|
82eb5ea8f3
|
Benchmark serving structured output (#10880)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-12-04 16:28:21 -05:00 |
|
Chendi.Xue
|
381ac93bb5
|
[Benchmark] Benchmark structured output with datasets (#10557)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-12-03 17:21:06 -07:00 |
|
Michael Goin
|
4433195ab7
|
[Bugfix] Prevent benchmark_throughput.py from using duplicated random prompts (#10753)
|
2024-12-03 02:26:15 +00:00 |
|
Kuntai Du
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|
Roger Wang
|
c11f172187
|
[Misc] Adding MMMU-Pro vision dataset to serving benchmark (#10804)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-01 08:47:05 +00:00 |
|
Wang, Yi
|
8a93a598d9
|
fix the issue that len(tokenizer(prompt)["input_ids"]) > prompt_len (#10524)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
|
2024-11-21 11:15:36 +00:00 |
|
ElizaWszola
|
b00b33d77e
|
[Model][Quantization] HQQ support through Marlin kernel expansion (#9766)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-11-19 13:31:12 -08:00 |
|
Ricky Xu
|
90a6c759ca
|
[misc] partial prefix & random input generation benchmark (#9929)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-18 15:39:14 -08:00 |
|
Lucas Wilkinson
|
96d999fbe8
|
[Kernel] Initial Machete W4A8 support + Refactors (#9855)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-11-18 12:59:29 -07:00 |
|
Jaehyun An
|
8b6725b0cf
|
[Misc] Update benchmark to support image_url file or http (#10287)
Signed-off-by: rbbang <anjaehyun87@gmail.com>
|
2024-11-16 18:15:40 +08:00 |
|
Cyrus Leung
|
f4c2187e29
|
[Misc] Fix typo in #5895 (#10145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-08 09:07:01 +00:00 |
|
DearPlanet
|
ad39bd640c
|
[Bugfix] Add error handling when server cannot respond any valid tokens (#5895)
|
2024-11-08 04:58:37 +00:00 |
|
Cody Yu
|
201fc07730
|
[V1] Prefix caching (take 2) (#9972)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-11-07 17:34:44 -08:00 |
|
Russell Bryant
|
3be5b26a76
|
[CI/Build] Add shell script linting using shellcheck (#7925)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-07 18:17:29 +00:00 |
|
Atlas
|
a62bc0109c
|
[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. (#10105)
Signed-off-by: Mozhou <spli161006@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-11-07 11:20:30 +00:00 |
|
Aaron Pham
|
21063c11c7
|
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-11-06 07:11:55 +00:00 |
|
lkchen
|
d2e80332a7
|
[Feature] Update benchmark_throughput.py to support image input (#9851)
Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>
|
2024-11-05 19:30:02 +00:00 |
|
lkchen
|
9a5664d4a4
|
[Misc] Refactor benchmark_throughput.py (#9779)
Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
Co-authored-by: Linkun Chen <lkchen@github.com>
Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>
|
2024-11-04 14:32:16 -08:00 |
|
Tran Quang Dai
|
ea4adeddc1
|
[Bugfix] Fix E2EL mean and median stats (#9984)
Signed-off-by: daitran2k1 <tranquangdai7a@gmail.com>
|
2024-11-04 09:37:58 +00:00 |
|
Guillaume Calmettes
|
abbfb6134d
|
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837)
|
2024-10-30 18:15:56 -07:00 |
|
wangshuai09
|
622b7ab955
|
[Hardware] using current_platform.seed_everything (#9785)
Signed-off-by: wangshuai09 <391746016@qq.com>
|
2024-10-29 14:47:44 +00:00 |
|
youkaichao
|
32176fee73
|
[torch.compile] support moe models (#9632)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-27 21:58:04 -07:00 |
|
Michael Goin
|
fd0e2cfdb2
|
[Misc] Separate total and output tokens in benchmark_throughput.py (#8914)
|
2024-10-23 16:47:20 +00:00 |
|
Chen Zhang
|
65050a40e6
|
[Bugfix] Generate exactly input_len tokens in benchmark_throughput (#9592)
|
2024-10-22 17:45:35 -07:00 |
|
Jeremy Arnold
|
cb6fdaa0a0
|
[Misc] Make benchmarks use EngineArgs (#9529)
|
2024-10-22 15:40:38 -07:00 |
|
Andy Dai
|
855e0e6f97
|
[Frontend][Misc] Goodput metric support (#9338)
|
2024-10-20 18:39:32 +00:00 |
|
Russell Bryant
|
7dbe738d65
|
[Misc] benchmark: Add option to set max concurrency (#9390)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-10-18 11:15:28 -07:00 |
|