Lily Liu
453bafb96f
Merge pull request #498 from MoeedDar/main
...
Fixed old name reference for max_seq_len
2023-07-18 09:22:56 -07:00
MoeedDar
328d231c17
Fixed old name reference for max_seq_len
2023-07-18 16:47:59 +01:00
Lily Liu
b4b195b360
fix max seq len ( #489 )
2023-07-17 23:20:20 -07:00
codethazine
20b0d88d16
Add support for baichuan ( #365 )
2023-07-17 13:50:55 -07:00
Zhuohan Li
2bdea7ac11
[Fix] Fix the condition of max_seq_len ( #477 )
2023-07-17 00:33:48 -04:00
Zhanghao Wu
58df2883cb
[Doc] Add doc for running vLLM on the cloud ( #426 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-16 13:37:14 -07:00
Zhangir Azerbayev
6d7d95a70a
Offload port selection to OS ( #467 )
2023-07-15 23:11:02 -07:00
Zhuohan Li
96853af5a8
Optimize MQA Kernel ( #452 )
2023-07-14 20:06:40 -04:00
Wen Sun
dbed69058c
Fix the KeyError when loading bloom-based models ( #441 )
2023-07-13 21:58:09 -07:00
panda
7b6ae94059
add vocab padding for LLama(Support WizardLM) ( #411 )
2023-07-13 23:56:22 -04:00
xcnick
c6dfc3cdbe
Fix handling of special tokens in decoding. ( #418 )
2023-07-12 11:14:56 -04:00
Keming
51be365143
fix: freeze pydantic to v1 ( #429 )
2023-07-12 11:10:55 -04:00
Andre Slavescu
c894836108
[Model] Add support for GPT-J ( #226 )
...
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Fazlul Shahriar
75beba29b5
Don't try to load training_args.bin ( #373 )
2023-07-08 15:26:28 -07:00
Woosuk Kwon
ddfdf470ae
Add trust_remote_code arg to get_config ( #405 )
2023-07-08 15:24:17 -07:00
Woosuk Kwon
b6fbb9a565
Sort the outputs before return ( #402 )
2023-07-08 14:48:18 -07:00
Lily Liu
2179e4f4c5
avoid python list copy in sequence initialization ( #401 )
2023-07-08 12:42:08 -07:00
codethazine
a945fcc2ae
Add trust-remote-code flag to handle remote tokenizers ( #364 )
2023-07-07 11:04:58 -07:00
Nicolas Frenay
be54f8e5c4
[Fix] Change /generate response-type to json for non-streaming ( #374 )
2023-07-06 18:15:17 -07:00
Ricardo Lu
b396cb4998
fix: only response [DONE] once when streaming response. ( #378 )
2023-07-06 18:08:40 -07:00
Woosuk Kwon
1c395b4eaa
Bump up the version ( #300 )
v0.1.2
2023-07-04 21:41:53 -07:00
akxxsb
3d64cf019e
[Server] use fastchat.model.model_adapter.get_conversation_template method to get model template ( #357 )
2023-07-04 21:39:59 -07:00
Zhuohan Li
98fe8cb542
[Server] Add option to specify chat template for chat endpoint ( #345 )
2023-07-03 23:01:56 -07:00
Woosuk Kwon
ffa6d2f9f9
[Docs] Fix typo ( #346 )
2023-07-03 16:51:47 -07:00
Woosuk Kwon
404422f42e
[Model] Add support for MPT ( #334 )
2023-07-03 16:47:53 -07:00
coolcloudcol
7717d0838b
Fix an endless loop issue when engine_step throws a RuntimeError ( #339 )
2023-07-03 15:22:28 -07:00
Zhuohan Li
42e0c1df78
[Quality] Add CI for formatting ( #343 )
2023-07-03 14:50:56 -07:00
Woosuk Kwon
e41f06702c
Add support for BLOOM ( #331 )
2023-07-03 13:12:35 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter ( #326 )
2023-07-03 11:31:55 -07:00
Zhuohan Li
0ffded812a
[Fix] Better error message for batched prompts ( #342 )
2023-07-03 09:27:31 -07:00
Michele Catalano
0bd2a573a5
Allow send list of str for the Prompt on openai demo endpoint /v1/completions ( #323 )
...
* allow str or List[str] for prompt
* Update vllm/entrypoints/openai/api_server.py
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
---------
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-03 09:17:50 -07:00
Ricardo Lu
49b26e2cec
feat: add ChatCompletion endpoint in OpenAI demo server. ( #330 )
2023-07-02 22:54:33 -07:00
Lily Liu
dafd924c1f
Raise error for long prompt ( #273 )
2023-06-30 18:48:49 -07:00
Zhuohan Li
598dc4b79a
[Fix] Weight loading for GPTBigCode ( #313 )
2023-06-29 22:14:17 -07:00
Zhuohan Li
85de093472
[Fix] Do not pin memory when in WSL ( #312 )
2023-06-29 15:00:21 -07:00
Zhanghao Wu
f72297562f
Add news for the vllm+skypilot example ( #314 )
2023-06-29 12:32:37 -07:00
Bayang
9d27b09d12
Update README.md ( #306 )
2023-06-29 06:52:15 -07:00
Woosuk Kwon
998d9d1509
[Tokenizer] Add tokenizer mode ( #298 )
2023-06-28 14:19:22 -07:00
Lily Liu
425040d4c1
remove floats == 0 comparison ( #285 )
2023-06-28 14:11:51 -07:00
Woosuk Kwon
4338cc4750
[Tokenizer] Add an option to specify tokenizer ( #284 )
2023-06-28 09:46:58 -07:00
Jishnu Ray Chowdhury
bdd6b4c8bc
Add LLM.set_tokenizer ( #283 )
2023-06-28 00:28:29 -07:00
Cody Yu
2b7d3aca2e
Update setup.py ( #282 )
...
Co-authored-by: neubig <neubig@gmail.com>
2023-06-27 14:34:23 -07:00
twaka
4026a049d3
expand coverage of gpt2 model loading ( #271 )
2023-06-27 06:27:41 -07:00
Zhuohan Li
43710e8d09
[Fix] Fix default port number in benchmark scripts ( #265 )
2023-06-26 13:15:35 -07:00
Woosuk Kwon
526df28fb2
[BugFix] Fix a bug in counting running sequences ( #266 )
2023-06-26 13:09:02 -07:00
Zhuohan Li
2cf1a333b6
[Doc] Documentation for distributed inference ( #261 )
2023-06-26 11:34:23 -07:00
Zhuohan Li
0b7db411b5
[Bug] Fix the OOM condition for CPU cache ( #260 )
2023-06-26 11:16:13 -07:00
BasicCoder
471a7a4566
Compatible with Decapoda Research llama hf version ( #251 )
2023-06-26 09:23:57 -07:00
Lianmin Zheng
6214dd6ce9
Update README.md ( #236 )
2023-06-25 16:58:06 -07:00
metacryptom
0603379863
fix wrong using getattr to get dict value ( #232 )
2023-06-24 22:00:24 -07:00