xwjiang2010
|
64172a976c
|
[Feature] Add vision language model support. (#3042)
|
2024-03-25 14:16:30 -07:00 |
|
Swapnil Parekh
|
819924e749
|
[Core] Adding token ranks along with logprobs (#3516)
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
|
2024-03-25 10:13:10 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
Woosuk Kwon
|
925f3332ca
|
[Core] Refactor Attention Take 2 (#3462)
|
2024-03-25 04:39:33 +00:00 |
|
Antoni Baum
|
426ec4ec67
|
[1/n] Triton sampling kernel (#3186)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-03-20 14:45:08 -07:00 |
|
Zhuohan Li
|
2f8844ba08
|
Re-enable the 80 char line width limit (#3305)
|
2024-03-10 19:49:14 -07:00 |
|
Cade Daniel
|
8437bae6ef
|
[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling (#3103)
|
2024-03-08 23:32:46 -08:00 |
|
jacobthebanana
|
8cbba4622c
|
Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) (#3263)
|
2024-03-07 23:03:22 +00:00 |
|
Cade Daniel
|
a33ce60c66
|
[Testing] Fix core tests (#3224)
|
2024-03-06 01:04:23 -08:00 |
|
Nick Hill
|
8999ec3c16
|
Store eos_token_id in Sequence for easy access (#3166)
|
2024-03-05 15:35:43 -08:00 |
|
Antoni Baum
|
22de45235c
|
Push logprob generation to LLMEngine (#3065)
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
|
2024-03-04 19:54:06 +00:00 |
|
Zhuohan Li
|
996d095c54
|
[FIX] Fix styles in automatic prefix caching & add a automatic prefix caching benchmark (#3158)
|
2024-03-03 14:37:18 -08:00 |
|
Sage Moore
|
ce4f5a29fb
|
Add Automatic Prefix Caching (#2762)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-03-02 00:50:01 -08:00 |
|
Nick Hill
|
7d2dcce175
|
Support per-request seed (#2514)
|
2024-02-21 11:47:00 -08:00 |
|
Antoni Baum
|
017d9f1515
|
Add metrics to RequestOutput (#2876)
|
2024-02-20 21:55:57 -08:00 |
|
zspo
|
0e163fce18
|
Fix default length_penalty to 1.0 (#2667)
|
2024-02-01 15:59:39 -08:00 |
|
Robert Shaw
|
93b38bea5d
|
Refactor Prometheus and Add Request Level Metrics (#2316)
|
2024-01-31 14:58:07 -08:00 |
|
Antoni Baum
|
9b945daaf1
|
[Experimental] Add multi-LoRA support (#1804)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
|
2024-01-23 15:26:37 -08:00 |
|
zspo
|
4df417d059
|
fix: fix some args desc (#2487)
|
2024-01-18 09:41:44 -08:00 |
|
shiyi.c_98
|
d10f8e1d43
|
[Experimental] Prefix Caching Support (#1669)
Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-01-17 16:32:10 -08:00 |
|
Zhuohan Li
|
708e6c18b0
|
[FIX] Fix class naming (#1803)
|
2023-11-28 14:08:01 -08:00 |
|
Woosuk Kwon
|
f8a1e39fae
|
[BugFix] Define __eq__ in SequenceGroupOutputs (#1389)
|
2023-10-17 01:09:44 -07:00 |
|
Zhuohan Li
|
9d9072a069
|
Implement prompt logprobs & Batched topk for computing logprobs (#1328)
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
|
2023-10-16 10:56:50 -07:00 |
|
Wang Ran (汪然)
|
ac5cf86aa6
|
Fix __repr__ of SequenceOutputs (#1311)
|
2023-10-10 09:58:28 -07:00 |
|
Zhuohan Li
|
6b5296aa3a
|
[FIX] Explain why the finished_reason of ignored sequences are length (#1289)
|
2023-10-08 15:22:38 -07:00 |
|
Zhuohan Li
|
f029ef94d7
|
Fix get_max_num_running_seqs for waiting and swapped seq groups (#1068)
|
2023-09-18 11:49:40 -07:00 |
|
Zhuohan Li
|
f04908cae7
|
[FIX] Minor bug fixes (#1035)
* [FIX] Minor bug fixes
* Address review comments
|
2023-09-13 16:38:12 -07:00 |
|
Antoni Baum
|
9841d48a10
|
Use TGI-like incremental detokenization (#984)
|
2023-09-13 13:38:01 -07:00 |
|
Zhuohan Li
|
002800f081
|
Align vLLM's beam search implementation with HF generate (#857)
|
2023-09-04 17:29:42 -07:00 |
|
Lily Liu
|
2179e4f4c5
|
avoid python list copy in sequence initialization (#401)
|
2023-07-08 12:42:08 -07:00 |
|
Zhuohan Li
|
d6fa1be3a8
|
[Quality] Add code formatter and linter (#326)
|
2023-07-03 11:31:55 -07:00 |
|
Lily Liu
|
dafd924c1f
|
Raise error for long prompt (#273)
|
2023-06-30 18:48:49 -07:00 |
|
Woosuk Kwon
|
0b98ba15c7
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|