xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-03-19 07:27:14 +08:00

History

Wei-Sheng Chin 795b662cff

Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) (#8241 )

2024-09-06 20:18:16 -07:00

..

cutlass_benchmarks

[Kernel] Add per-tensor and per-token AZP epilogues (#5941 )

2024-08-06 18:17:08 +00:00

[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233 )

2024-08-21 20:18:00 -04:00

[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )

2024-06-20 17:00:13 -06:00

backend_request_func.py

[Frontend] Add --logprobs argument to benchmark_serving.py (#8191 )

2024-09-06 09:01:14 -07:00

benchmark_latency.py

[Frontend] Refactor prompt processing (#4028 )

2024-07-22 10:13:53 -07:00

benchmark_prefix_caching.py

[Misc] Enhance prefix-caching benchmark tool (#6568 )

2024-08-22 09:32:02 -07:00

benchmark_serving.py

Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) (#8241 )

2024-09-06 20:18:16 -07:00

benchmark_throughput.py

[Benchmark] Add --async-engine option to benchmark_throughput.py (#7964 )

2024-09-03 20:57:41 -04:00

launch_tgi_server.sh

[benchmark] Update TGI version (#7917 )

2024-08-27 15:07:53 -07:00

README.md

Change the name to vLLM (#150 )

2023-06-17 03:07:40 -07:00

sonnet.txt

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )

2024-03-27 13:39:26 -07:00

README.md

Benchmarking vLLM

Downloading the ShareGPT dataset

You can download the dataset by running:

wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json