vllm/engine at 911215528324a52d74a729335506aa5ec0a7cc65 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-07 23:02:15 +08:00

History

Kuntai Du 9112155283

[Perf] Use small max_num_batched_tokens for A100 (#17885 )

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

2025-05-11 07:53:23 +00:00

..

multiprocessing

Improve exception reporting in MP engine (#17800 )

2025-05-08 05:32:39 +00:00

output_processor

Add full API docs and improve the UX of navigating them (#17485 )

2025-05-03 19:42:43 -07:00

__init__.py

Change the name to vLLM (#150 )

2023-06-17 03:07:40 -07:00

arg_utils.py

[Perf] Use small max_num_batched_tokens for A100 (#17885 )

2025-05-11 07:53:23 +00:00

async_llm_engine.py

Add full API docs and improve the UX of navigating them (#17485 )

2025-05-03 19:42:43 -07:00

async_timeout.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

llm_engine.py

Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 )

2025-05-07 00:07:30 -07:00

metrics_types.py

[V1][Metrics] Support vllm:cache_config_info (#13299 )

2025-02-22 00:20:00 -08:00

metrics.py

[Metrics] Fix minor inconsistencies in bucket progression (#17262 )

2025-04-27 16:19:39 +00:00

protocol.py

[Misc] Clean up input processing (#17582 )

2025-05-02 08:11:53 -07:00