Zhuohan Li
|
f7389f4763
|
[Doc] Add Baichuan 13B to supported models (#656)
|
2023-08-02 16:45:12 -07:00 |
|
Woosuk Kwon
|
55fe8a81ec
|
Refactor scheduler (#658)
|
2023-08-02 16:42:01 -07:00 |
|
YHPeter
|
e8ddc08ec8
|
[BUG FIX] upgrade fschat version to 0.2.23 (#650)
Co-authored-by: hao.yu <hao.yu@cn-c017.server.mila.quebec>
|
2023-08-02 14:05:59 -07:00 |
|
Zhuohan Li
|
1b0bd0fe8a
|
Add Falcon support (new) (#592)
|
2023-08-02 14:04:39 -07:00 |
|
Lily Liu
|
20044cab7a
|
Fix log message in scheduler (#652)
|
2023-08-02 13:35:10 -07:00 |
|
Song
|
64f23c2900
|
fix baichuan for different position embedding for 7b and 13b models (#643)
|
2023-08-01 22:22:51 -07:00 |
|
Qing
|
d4c7755ca8
|
fix biachuan-7b tp (#598)
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
|
2023-08-01 15:41:36 -07:00 |
|
Chaofan Lin
|
aa39e42c5a
|
fix doc (#622)
|
2023-07-31 13:11:57 -07:00 |
|
Fang li
|
953f28cf9a
|
fix ModuleNotFoundError (#599)
Co-authored-by: fangli <fangli@tencent.com>
|
2023-07-29 20:52:41 -07:00 |
|
Xudong Zhang
|
c0d00f5be6
|
[Fix] fix import error of RayWorker (#604) (#605)
|
2023-07-27 23:37:40 -07:00 |
|
Zhuohan Li
|
58a072be15
|
[Fix] Add model sequence length into model config (#575)
|
2023-07-25 23:46:30 -07:00 |
|
Zhuohan Li
|
82ad323dee
|
[Fix] Add chat completion Example and simplify dependencies (#576)
|
2023-07-25 23:45:48 -07:00 |
|
Zhuohan Li
|
df5dd3c68e
|
Add Baichuan-7B to README (#494)
|
2023-07-25 15:25:12 -07:00 |
|
MoeedDar
|
2d867b55fa
|
fixed tensor parallel is not defined (#564)
|
2023-07-25 14:16:51 -07:00 |
|
Tao Peng
|
d7a1c6d614
|
Fix paged attention testing. (#495)
Signed-off-by: Tao Peng <jiankeng.pt@alibaba-inc.com>
|
2023-07-24 21:01:56 -07:00 |
|
Zhuohan Li
|
7d5a155e4a
|
[Fix] Fix GPTBigcoder for distributed execution (#503)
|
2023-07-24 18:36:33 -07:00 |
|
leegohi04517
|
1dde34e0f8
|
GPTJConfig has no attribute rotary. (#532)
|
2023-07-24 11:29:30 -07:00 |
|
Zhuohan Li
|
6fc2a38b11
|
Add support for LLaMA-2 (#505)
|
2023-07-20 11:38:27 -07:00 |
|
Antoni Baum
|
c487a221ee
|
Fix bad assert in initialize_cluster if PG already exists (#526)
|
2023-07-19 23:17:12 -07:00 |
|
Antoni Baum
|
9925c17940
|
Ray placement group support (#397)
|
2023-07-19 22:49:31 -07:00 |
|
Ricardo Lu
|
8c4b2592fb
|
fix: enable trust-remote-code in api server & benchmark. (#509)
|
2023-07-19 17:06:15 -07:00 |
|
WRH
|
cf21a9bd5c
|
support trust_remote_code in benchmark (#518)
|
2023-07-19 17:02:40 -07:00 |
|
Massimiliano Pronesti
|
16c3e295a8
|
fix(ray_utils): ignore re-init error (#465)
|
2023-07-19 17:01:19 -07:00 |
|
Song
|
bda41c70dd
|
hotfix attn alibi wo head mapping (#496)
Co-authored-by: oliveryuan <oliveryuan@basemind.com>
|
2023-07-18 11:31:48 -07:00 |
|
Lily Liu
|
453bafb96f
|
Merge pull request #498 from MoeedDar/main
Fixed old name reference for max_seq_len
|
2023-07-18 09:22:56 -07:00 |
|
MoeedDar
|
328d231c17
|
Fixed old name reference for max_seq_len
|
2023-07-18 16:47:59 +01:00 |
|
Lily Liu
|
b4b195b360
|
fix max seq len (#489)
|
2023-07-17 23:20:20 -07:00 |
|
codethazine
|
20b0d88d16
|
Add support for baichuan (#365)
|
2023-07-17 13:50:55 -07:00 |
|
Zhuohan Li
|
2bdea7ac11
|
[Fix] Fix the condition of max_seq_len (#477)
|
2023-07-17 00:33:48 -04:00 |
|
Zhanghao Wu
|
58df2883cb
|
[Doc] Add doc for running vLLM on the cloud (#426)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-07-16 13:37:14 -07:00 |
|
Zhangir Azerbayev
|
6d7d95a70a
|
Offload port selection to OS (#467)
|
2023-07-15 23:11:02 -07:00 |
|
Zhuohan Li
|
96853af5a8
|
Optimize MQA Kernel (#452)
|
2023-07-14 20:06:40 -04:00 |
|
Wen Sun
|
dbed69058c
|
Fix the KeyError when loading bloom-based models (#441)
|
2023-07-13 21:58:09 -07:00 |
|
panda
|
7b6ae94059
|
add vocab padding for LLama(Support WizardLM) (#411)
|
2023-07-13 23:56:22 -04:00 |
|
xcnick
|
c6dfc3cdbe
|
Fix handling of special tokens in decoding. (#418)
|
2023-07-12 11:14:56 -04:00 |
|
Keming
|
51be365143
|
fix: freeze pydantic to v1 (#429)
|
2023-07-12 11:10:55 -04:00 |
|
Andre Slavescu
|
c894836108
|
[Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-07-08 17:55:16 -07:00 |
|
Fazlul Shahriar
|
75beba29b5
|
Don't try to load training_args.bin (#373)
|
2023-07-08 15:26:28 -07:00 |
|
Woosuk Kwon
|
ddfdf470ae
|
Add trust_remote_code arg to get_config (#405)
|
2023-07-08 15:24:17 -07:00 |
|
Woosuk Kwon
|
b6fbb9a565
|
Sort the outputs before return (#402)
|
2023-07-08 14:48:18 -07:00 |
|
Lily Liu
|
2179e4f4c5
|
avoid python list copy in sequence initialization (#401)
|
2023-07-08 12:42:08 -07:00 |
|
codethazine
|
a945fcc2ae
|
Add trust-remote-code flag to handle remote tokenizers (#364)
|
2023-07-07 11:04:58 -07:00 |
|
Nicolas Frenay
|
be54f8e5c4
|
[Fix] Change /generate response-type to json for non-streaming (#374)
|
2023-07-06 18:15:17 -07:00 |
|
Ricardo Lu
|
b396cb4998
|
fix: only response [DONE] once when streaming response. (#378)
|
2023-07-06 18:08:40 -07:00 |
|
Woosuk Kwon
|
1c395b4eaa
|
Bump up the version (#300)
v0.1.2
|
2023-07-04 21:41:53 -07:00 |
|
akxxsb
|
3d64cf019e
|
[Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357)
|
2023-07-04 21:39:59 -07:00 |
|
Zhuohan Li
|
98fe8cb542
|
[Server] Add option to specify chat template for chat endpoint (#345)
|
2023-07-03 23:01:56 -07:00 |
|
Woosuk Kwon
|
ffa6d2f9f9
|
[Docs] Fix typo (#346)
|
2023-07-03 16:51:47 -07:00 |
|
Woosuk Kwon
|
404422f42e
|
[Model] Add support for MPT (#334)
|
2023-07-03 16:47:53 -07:00 |
|
coolcloudcol
|
7717d0838b
|
Fix an endless loop issue when engine_step throws a RuntimeError (#339)
|
2023-07-03 15:22:28 -07:00 |
|