vllm/vllm at 649aa730c5f8c2aefc4a6ead7d50a49c3355ce5a - vllm

xinyun/vllm

Fork 0

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-25 20:47:31 +08:00

History

Woosuk Kwon a19bc5c628

Automatically configure max_num_batched_tokens (#1198 )

2023-09-27 16:34:00 -07:00

core

Fix hanging when prompt exceeds limit (#1029 )

2023-09-17 01:48:56 -07:00

engine

Automatically configure max_num_batched_tokens (#1198 )

2023-09-27 16:34:00 -07:00

entrypoints

Align max_tokens behavior with openai (#852 )

2023-09-23 18:10:13 -07:00

model_executor

fix qwen-14b model (#1173 )

2023-09-27 16:33:16 -07:00

transformers_utils

fix qwen-14b model (#1173 )

2023-09-27 16:33:16 -07:00

worker

Allocate more shared memory to attention kernel (#1154 )

2023-09-26 22:27:13 -07:00

__init__.py

Bump up the version to v0.1.7 (#1013 )

2023-09-11 00:54:30 -07:00

block.py

[Quality] Add code formatter and linter (#326 )

2023-07-03 11:31:55 -07:00

config.py

Automatically configure max_num_batched_tokens (#1198 )

2023-09-27 16:34:00 -07:00

logger.py

[Quality] Add code formatter and linter (#326 )

2023-07-03 11:31:55 -07:00

outputs.py

Align vLLM's beam search implementation with HF generate (#857 )

2023-09-04 17:29:42 -07:00

sampling_params.py

[Sampler] Vectorized sampling (simplified) (#1048 )

2023-09-22 17:48:04 -07:00

sequence.py

Fix get_max_num_running_seqs for waiting and swapped seq groups (#1068 )

2023-09-18 11:49:40 -07:00

utils.py

Allocate more shared memory to attention kernel (#1154 )

2023-09-26 22:27:13 -07:00