vllm/engine at a41351f363f3e7a212582e51b2c1c35c18aaa9df - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-14 11:17:16 +08:00

History

[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734 )

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>

2025-04-25 00:45:02 -07:00

multiprocessing

Simplify TokenizerGroup (#16790 )

2025-04-24 04:43:56 -07:00

output_processor

[BugFix] fix some typos found by typos. (#16314 )

2025-04-09 03:43:59 -07:00

__init__.py

Change the name to vLLM (#150 )

2023-06-17 03:07:40 -07:00

arg_utils.py

[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734 )

2025-04-25 00:45:02 -07:00

async_llm_engine.py

Add collective_rpc to llm engine (#16999 )

2025-04-24 20:16:52 +00:00

async_timeout.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

llm_engine.py

Simplify TokenizerGroup (#16790 )

2025-04-24 04:43:56 -07:00

metrics_types.py

[V1][Metrics] Support vllm:cache_config_info (#13299 )

2025-02-22 00:20:00 -08:00

metrics.py

[Metrics] Add bucket for request_latency, time_to_first_token and time_per_output_token (#15202 )

2025-04-06 20:34:51 -07:00

protocol.py

Add "/server_info" endpoint in api_server to retrieve the vllm_config. (#16572 )

2025-04-15 11:50:38 +00:00