xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-16 10:35:52 +08:00

Author	SHA1	Message	Date
Jialin Ouyang	31a500c86f	[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP (#22437 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-08-13 14:44:06 -07:00
Daniel Serebrenik	e5d3d63c42	[Benchmark] Fix terminal colors in benchmark_serving_multi_turn (python 3.12) (#22730 ) Signed-off-by: daniels <daniels@pliops.com>	2025-08-12 14:41:37 +00:00
Jee Jee Li	384a052971	[Misc] benchmark_moe supports expert parallel (#22251 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-11 00:13:27 -07:00
Breno Baldas Skuk	65a7917be4	Fix(benchmarks): allow multiple mm contents in OpenAI Chat Completion Benchmarks (#22534 ) Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>	2025-08-10 09:03:15 -07:00
TJian	42172ad18f	[FEAT] [Performance] Add triton mrope to replace the torch code path (#22375 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-09 11:50:03 -07:00
Daniel Serebrenik	f0964e29cb	[Benchmark] Add benchmark tool for multi turn conversations (#20267 )	2025-08-08 10:28:50 -07:00
Syed Muhammad Bin Asif	609b533cb6	[Bugfix] Add proper comparison for package versions (#22314 ) Signed-off-by: Syed Muhammad Bin Asif <syedmba7@connect.hku.hk>	2025-08-06 20:31:03 -07:00
elvischenv	83156c7b89	[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-08-05 02:45:34 -07:00
ericehanley	309c1bb822	[Bug] Update auto_tune.sh to separate benchmarking and profiling. (#21629 ) Signed-off-by: Eric Hanley <ericehanley@google.com>	2025-08-04 15:12:06 +00:00
Roger Wang	067c34a155	docs: remove deprecated disable-log-requests flag (#22113 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-08-02 00:19:48 -07:00
Wentao Ye	eefbf4a68b	[Perf] Optimize `reshape_and_cache_flash` CUDA Kernel (#22036 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 19:18:51 -04:00
Jee Jee Li	8d705996df	[Misc] Minor enhancement of benchmark_moe (#22068 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-02 01:35:30 +08:00
Wentao Ye	3700642013	[Refactor] Remove Duplicate `per_block_cast_to_fp8`, Remove Dependencies of DeepGEMM (#21787 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 01:13:27 +00:00
Wentao Ye	0271c2ff2f	[Test] Add Benchmark and Unit Test for `per_token_group_quant` (#21860 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-30 07:15:02 -07:00
Peter Pan	533db0935d	[benchmark] add max-concurrency in result table (#21095 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2025-07-30 01:15:43 -07:00
Harry Mellor	ba5c5e5404	[Docs] Switch to better markdown linting pre-commit hook (#21851 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 19:45:08 -07:00
elvischenv	58b11b24a6	[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend (#21525 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-07-29 10:34:00 -04:00
rongfu.leng	18cc33dd60	[bugfix] fix profile impact benchmark results (#21507 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-07-27 22:44:24 -07:00
Caleb_Du	57c22e57f9	Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-07-27 07:08:00 -07:00
Ye (Charlotte) Qi	01a395e9e7	[CI/Build][Doc] Clean up more docs that point to old bench scripts (#21667 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-07-27 04:02:12 +00:00
Ye (Charlotte) Qi	e7c4f9ee86	[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI (#21355 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-07-26 07:10:14 -07:00
Wentao Ye	56e544f24b	[Refactor] Remove `moe_align_block_size_triton` (#21335 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-26 07:08:29 -07:00
Huy Do	e98def439c	[Take 2] Correctly kill vLLM processes after benchmarks (#21646 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-07-26 06:06:05 -07:00
Chengji Yao	947edd099e	[Misc][Tools] make max-model-len a parameter in auto_tune script (#21321 ) Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-24 22:46:43 -07:00
Cyrus Leung	34ddcf9ff4	[Frontend] `run-batch` supports V1 (#21541 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-24 20:05:55 -07:00
ericehanley	4f76a05f4f	[BugFix] Update python to python3 calls for image; fix prefix & input calculations. (#21391 ) Signed-off-by: Eric Hanley <ericehanley@google.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-22 20:33:00 -07:00
Jialin Ouyang	10904e6d75	[benchmark] Port benchmark request sent optimization to benchmark_serving (#21209 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-07-22 05:28:00 -07:00
Ming Yang	e7b2042681	Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762 ) (#21334 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-07-21 21:49:01 -07:00
Himanshu Jaju	0ec82edda5	[perf] Speed up align sum kernels (#21079 ) Signed-off-by: Himanshu Jaju <hj@mistral.ai>	2025-07-21 11:19:23 -07:00
Yuxuan Zhang	10eb24cc91	GLM-4 Update (#20736 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Lu Fang <fanglu@fb.com>	2025-07-19 22:40:31 +00:00
Chenyaaang	3a2cb2649d	[Misc][Tools][Benchmark] Add readme file for auto_tune script (#20779 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-07-19 09:06:59 +00:00
JialinOuyang-Meta	0f199f197b	[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005 ) Signed-off-by: Jialin Ouyang <jialino@meta.com>	2025-07-18 12:34:40 -07:00
ElizaWszola	9fb2d22032	[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-17 09:56:44 -04:00
Asher	5a7fb3ab9e	[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 ) Signed-off-by: Asher Zhang <asherszhang@tencent.com>	2025-07-17 09:10:09 +00:00
Pavani Majety	7bd4c37ae7	[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: shuw <shuw@nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 09:23:23 +00:00
Luka Govedič	31d5c1797f	[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 04:56:28 +00:00
Wentao Ye	e2de455c34	[Feature] Integrate SM100 DeepGEMM support (#20087 )	2025-07-10 20:18:05 -07:00
Kuntai Du	5b6fe23d05	[Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. (#20786 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-10 14:52:46 -07:00
Michael Goin	0bbac1c1b4	[Bench] Add NVFP4 GEMM benchmark script (#20578 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-09 13:23:48 -04:00
Li Wang	9ff2af6d2b	[Benchmark] Parameterization of streaming loading of multimodal datasets (#20528 ) Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-09 13:35:16 +00:00
Brayden Zhong	cede942b87	[Benchmark] Add support for multiple batch size benchmark through CLI in `benchmark_moe.py` (#20516 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-07-06 09:20:11 +00:00
Jee Jee Li	1caca5a589	[Misc] Add SPDX-FileCopyrightText (#20428 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-04 07:40:42 +00:00
bnellnm	c1909e7e8c	[Kernels] MoE refactor (#19636 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>	2025-07-02 06:08:27 -07:00
Kebe	b1c1fe35a5	[Misc] remove redundant char (#20287 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-07-01 15:33:22 +08:00
czhu-cohere	9909726d2a	Enable ZP Support for Machete (#20268 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-07-01 07:12:20 +00:00
Reid	167aca45cb	[Misc] Use collapsible blocks for benchmark examples. (#20017 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-26 03:35:16 -07:00
Ekagra Ranjan	9502c38138	[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline (#20083 )	2025-06-25 22:06:27 -07:00
Wentao Ye	879f69bed3	[Refactor] Remove duplicate `ceil_div` (#20023 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-25 05:19:09 +00:00
Wentao Ye	a6c4b87fbc	Revert "[Feature] Integrate new deepgemm (#19820 )" (#20049 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 19:45:22 -07:00
Wentao Ye	c6e3bba8e6	[Feature] Integrate new deepgemm (#19820 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 12:51:56 -07:00

1 2 3 4 5 ...

401 Commits