vllm/vllm at 928de46888b9b257dfa491047a7d9cd199ca585b - vllm

xinyun/vllm

Fork 0

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-19 17:14:33 +08:00

History

Woosuk Kwon 928de46888

Implement PagedAttention V2 (#1348 )

2023-10-16 00:59:57 -07:00

core

Use monotonic time where appropriate (#1249 )

2023-10-02 19:22:05 -07:00

engine

Use monotonic time where appropriate (#1249 )

2023-10-02 19:22:05 -07:00

entrypoints

API server support ipv4 / ipv6 dualstack (#1288 )

2023-10-07 15:15:54 -07:00

model_executor

Implement PagedAttention V2 (#1348 )

2023-10-16 00:59:57 -07:00

transformers_utils

Fix the issue for AquilaChat2-* models (#1339 )

2023-10-13 11:51:29 -07:00

worker

Move bfloat16 check to worker (#1259 )

2023-10-07 22:10:44 -07:00

__init__.py

Bump up the version to v0.2.0 (#1212 )

2023-09-28 15:30:38 -07:00

block.py

[Quality] Add code formatter and linter (#326 )

2023-07-03 11:31:55 -07:00

config.py

Move bfloat16 check to worker (#1259 )

2023-10-07 22:10:44 -07:00

logger.py

[Quality] Add code formatter and linter (#326 )

2023-07-03 11:31:55 -07:00

outputs.py

Align vLLM's beam search implementation with HF generate (#857 )

2023-09-04 17:29:42 -07:00

sampling_params.py

[Minor] Fix type annotations (#1238 )

2023-10-02 15:28:31 -07:00

sequence.py

Fix __repr__ of SequenceOutputs (#1311 )

2023-10-10 09:58:28 -07:00

utils.py

Allocate more shared memory to attention kernel (#1154 )

2023-09-26 22:27:13 -07:00