vllm/engine at bdce64f2365b39335141f8efcb3a0a8ecc559153 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-19 18:14:30 +08:00

History

Rui Qiao bdce64f236

[V1] Support DP with Ray (#18779 )

2025-06-02 21:15:13 -07:00

..

__init__.py

[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )

2025-05-30 08:17:00 -07:00

async_llm.py

[V1] Support DP with Ray (#18779 )

2025-06-02 21:15:13 -07:00

coordinator.py

[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )

2025-05-30 08:17:00 -07:00

core_client.py

[V1] Support DP with Ray (#18779 )

2025-06-02 21:15:13 -07:00

core.py

[V1] Support DP with Ray (#18779 )

2025-06-02 21:15:13 -07:00

detokenizer.py

Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158 )

2025-04-25 17:10:32 +08:00

exceptions.py

[V1][Frontend] Improve Shutdown And Logs (#11737 )

2025-04-16 19:48:34 -07:00

llm_engine.py

[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010 )

2025-05-27 09:37:06 +00:00

logprobs.py

[V1] Aggregate chunked prompt logprobs in model runner (#14875 )

2025-03-24 12:27:57 -04:00

mm_input_cache.py

[Bugfix] Avoid repeatedly creating dummy data during engine startup (#17935 )

2025-05-12 22:40:19 -07:00

output_processor.py

[Feature][V1]: suupports cached_tokens in response usage (#18149 )

2025-05-23 01:41:03 -07:00

parallel_sampling.py

[V1] Avoid redundant input processing in n>1 case (#14985 )

2025-03-20 22:24:10 -07:00

processor.py

[Bugfix] Avoid repeatedly creating dummy data during engine startup (#17935 )

2025-05-12 22:40:19 -07:00