4593 Commits

Author SHA1 Message Date
xcnick
c6dfc3cdbe
Fix handling of special tokens in decoding. (#418) 2023-07-12 11:14:56 -04:00
Andre Slavescu
c894836108
[Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Fazlul Shahriar
75beba29b5
Don't try to load training_args.bin (#373) 2023-07-08 15:26:28 -07:00
Woosuk Kwon
ddfdf470ae
Add trust_remote_code arg to get_config (#405) 2023-07-08 15:24:17 -07:00
Woosuk Kwon
b6fbb9a565
Sort the outputs before return (#402) 2023-07-08 14:48:18 -07:00
Lily Liu
2179e4f4c5
avoid python list copy in sequence initialization (#401) 2023-07-08 12:42:08 -07:00
codethazine
a945fcc2ae
Add trust-remote-code flag to handle remote tokenizers (#364) 2023-07-07 11:04:58 -07:00
Nicolas Frenay
be54f8e5c4
[Fix] Change /generate response-type to json for non-streaming (#374) 2023-07-06 18:15:17 -07:00
Ricardo Lu
b396cb4998
fix: only response [DONE] once when streaming response. (#378) 2023-07-06 18:08:40 -07:00
Woosuk Kwon
1c395b4eaa
Bump up the version (#300) 2023-07-04 21:41:53 -07:00
akxxsb
3d64cf019e
[Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357) 2023-07-04 21:39:59 -07:00
Zhuohan Li
98fe8cb542
[Server] Add option to specify chat template for chat endpoint (#345) 2023-07-03 23:01:56 -07:00
Woosuk Kwon
404422f42e
[Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00
coolcloudcol
7717d0838b
Fix an endless loop issue when engine_step throws a RuntimeError (#339) 2023-07-03 15:22:28 -07:00
Zhuohan Li
42e0c1df78
[Quality] Add CI for formatting (#343) 2023-07-03 14:50:56 -07:00
Woosuk Kwon
e41f06702c
Add support for BLOOM (#331) 2023-07-03 13:12:35 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Zhuohan Li
0ffded812a
[Fix] Better error message for batched prompts (#342) 2023-07-03 09:27:31 -07:00
Michele Catalano
0bd2a573a5
Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323)
* allow str or List[str] for prompt

* Update vllm/entrypoints/openai/api_server.py

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

---------

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-03 09:17:50 -07:00
Ricardo Lu
49b26e2cec
feat: add ChatCompletion endpoint in OpenAI demo server. (#330) 2023-07-02 22:54:33 -07:00
Lily Liu
dafd924c1f
Raise error for long prompt (#273) 2023-06-30 18:48:49 -07:00
Zhuohan Li
598dc4b79a
[Fix] Weight loading for GPTBigCode (#313) 2023-06-29 22:14:17 -07:00
Zhuohan Li
85de093472
[Fix] Do not pin memory when in WSL (#312) 2023-06-29 15:00:21 -07:00
Woosuk Kwon
998d9d1509
[Tokenizer] Add tokenizer mode (#298) 2023-06-28 14:19:22 -07:00
Lily Liu
425040d4c1
remove floats == 0 comparison (#285) 2023-06-28 14:11:51 -07:00
Woosuk Kwon
4338cc4750
[Tokenizer] Add an option to specify tokenizer (#284) 2023-06-28 09:46:58 -07:00
Jishnu Ray Chowdhury
bdd6b4c8bc
Add LLM.set_tokenizer (#283) 2023-06-28 00:28:29 -07:00
twaka
4026a049d3
expand coverage of gpt2 model loading (#271) 2023-06-27 06:27:41 -07:00
Woosuk Kwon
526df28fb2
[BugFix] Fix a bug in counting running sequences (#266) 2023-06-26 13:09:02 -07:00
Zhuohan Li
0b7db411b5
[Bug] Fix the OOM condition for CPU cache (#260) 2023-06-26 11:16:13 -07:00
BasicCoder
471a7a4566
Compatible with Decapoda Research llama hf version (#251) 2023-06-26 09:23:57 -07:00
metacryptom
0603379863
fix wrong using getattr to get dict value (#232) 2023-06-24 22:00:24 -07:00
Michael Feil
298695b766
GPTBigCode (StarCoder, SantaCoder Support) (#209) 2023-06-23 01:49:27 +08:00
Zhuohan Li
83658c8ace
Bump up version to 0.1.1 (#204) 2023-06-22 15:33:32 +08:00
Zhuohan Li
1d24ccb96c
[Fix] Better error message when there is OOM during cache initialization (#203) 2023-06-22 15:30:06 +08:00
Woosuk Kwon
14f0b39cda
[Bugfix] Fix a bug in RequestOutput.finished (#202) 2023-06-22 00:17:24 -07:00
Zhuohan Li
2e0d314384
fix-ray (#193) 2023-06-22 00:21:41 +08:00
Woosuk Kwon
67d96c29fb
Use slow tokenizer for open llama models (#168) 2023-06-20 14:19:47 +08:00
Woosuk Kwon
7e2a913c64
[Minor] Fix CompletionOutput.__repr__ (#157) 2023-06-18 19:58:25 -07:00
Woosuk Kwon
3f92038b99
Add comments on swap space (#154) 2023-06-18 11:39:35 -07:00
Zhuohan Li
bf5f121c02
Reduce GPU memory utilization to make sure OOM doesn't happen (#153) 2023-06-18 17:33:50 +08:00
Zhuohan Li
bec7b2dc26
Add quickstart guide (#148) 2023-06-18 01:26:12 +08:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00