vllm/engine at de6f90a13d7b98c4958ba107ec16cb6f95efb10f - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-26 05:37:13 +08:00

History

Nick Hill 76515f303b

[Frontend] Use MQLLMEngine for embeddings models too (#8584 )

2024-09-19 12:51:06 -04:00

..

multiprocessing

[Frontend] Use MQLLMEngine for embeddings models too (#8584 )

2024-09-19 12:51:06 -04:00

output_processor

[Core] Optimize Async + Multi-step (#8050 )

2024-09-03 18:50:29 +00:00

__init__.py

Change the name to vLLM (#150 )

2023-06-17 03:07:40 -07:00

arg_utils.py

[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631 )

2024-09-17 07:35:01 -07:00

async_llm_engine.py

[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157 )

2024-09-18 13:56:58 +00:00

async_timeout.py

[Bugfix] AsyncLLMEngine hangs with asyncio.run (#5654 )

2024-06-19 13:57:12 -07:00

llm_engine.py

[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157 )

2024-09-18 13:56:58 +00:00

metrics_types.py

[MISC] Add prefix cache hit rate to metrics (#7606 )

2024-08-19 11:52:07 -07:00

metrics.py

[MISC] Add prefix cache hit rate to metrics (#7606 )

2024-08-19 11:52:07 -07:00

protocol.py

[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157 )

2024-09-18 13:56:58 +00:00