85 Commits

Author SHA1 Message Date
Elisei Smirnov
e3470f8753
[Core]: Option To Use Prompt Token Ids Inside Logits Processor (#4985)
Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>
2024-05-23 22:04:24 +00:00
sasha0552
69909126a7
[Bugfix] Use random seed if seed is -1 (#4531) 2024-05-01 10:41:17 -07:00
Li, Jiang
dd1a50a8bc
[Bugfix][Minor] Make ignore_eos effective (#4468) 2024-04-30 16:33:33 -07:00
Nick Hill
81661da7b2
[BugFix] Fix min_tokens when eos_token_id is None (#4389)
Co-authored-by: DefTruth <31974251+deftruth@users.noreply.github.com>
2024-04-27 09:52:46 -07:00
Simon Mo
a134ef6f5e
Support eos_token_id from generation_config.json (#4182) 2024-04-19 04:13:36 +00:00
SangBin Cho
09473ee41c
[mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
Nick Hill
e46a60aa4c
[BugFix] Fix handling of stop strings and stop token ids (#3672) 2024-04-11 15:34:12 -07:00
Thomas Parnell
1d7c940d74
Add option to completion API to truncate prompt tokens (#3144) 2024-04-05 10:15:42 -07:00
Matthias Gerstgrasser
aabe8f40f2
[Core] [Frontend] Make detokenization optional (#3749)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-04-03 21:52:18 -07:00
Travis Johnson
c13ad1b7bd
feat: implement the min_tokens sampling parameter (#3124)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-03-25 10:14:26 -07:00
Zhuohan Li
2f8844ba08
Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
Nick Hill
29a8d6a554
[Fix] Don't deep-copy LogitsProcessors when copying SamplingParams (#3099) 2024-02-29 19:20:42 +00:00
Nick Hill
7d2dcce175
Support per-request seed (#2514) 2024-02-21 11:47:00 -08:00
Nikola Borisov
3209b49033
[Bugfix] fix crash if max_tokens=None (#2570) 2024-01-23 22:38:55 -08:00
Roy
9140561059
[Minor] Fix typo and remove unused code (#2305) 2024-01-02 19:23:15 -08:00
Yunfeng Bai
c06170cc8e
Add a flag to include stop string in output text (#1976) 2023-12-15 00:45:58 -08:00
Roy
60dc62dc9e
add custom server params (#1868) 2023-12-03 12:59:18 -08:00
Jerry
f86bd6190a
Fix the typo in SamplingParams' docstring (#1886) 2023-12-01 02:06:36 -08:00
ljss
de23687d16
Fix repetition penalty aligned with huggingface (#1577) 2023-11-22 14:41:44 -08:00
ljss
4cea74c73b
Set top_p=0 and top_k=-1 in greedy sampling (#1748) 2023-11-22 12:51:09 -08:00
陈序
094f716bf2
Add stop_token_ids in SamplingParams.__repr__ (#1745) 2023-11-21 20:13:53 -08:00
Roy
e87557b069
Support Min P Sampler (#1642) 2023-11-17 16:20:49 -08:00
Noam Gat
555bdcc5a3
Added logits processor API to sampling params (#1469) 2023-11-03 14:12:15 -07:00
Dan Lord
7013a80170
Add support for spaces_between_special_tokens 2023-10-30 16:52:56 -07:00
ljss
69be658bba
Support repetition_penalty (#1424) 2023-10-29 10:02:41 -07:00
Zhuohan Li
9d9072a069
Implement prompt logprobs & Batched topk for computing logprobs (#1328)
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
2023-10-16 10:56:50 -07:00
Woosuk Kwon
84e4e37d14
[Minor] Fix type annotations (#1238) 2023-10-02 15:28:31 -07:00
Dan Lord
20f7cc4cde
Add skip_special_tokens sampling params (#1186) 2023-09-27 19:21:42 -07:00
Zhuohan Li
947b794146
[Sampler] Vectorized sampling (simplified) (#1048)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2023-09-22 17:48:04 -07:00
Ricardo Lu
f98b745a81
feat: support stop_token_ids parameter. (#1097) 2023-09-21 15:34:02 -07:00
Zhuohan Li
002800f081
Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
wangcx18
0c04ce3234
Fix typo in sampling_params.py (#788) 2023-08-18 10:12:46 +09:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Lily Liu
425040d4c1
remove floats == 0 comparison (#285) 2023-06-28 14:11:51 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00