xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-15 13:57:28 +08:00

Author	SHA1	Message	Date
SangBin Cho	0d62fe58db	[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451 )	2024-05-01 19:24:13 -07:00
leiwen83	24750f4cad	[Core] Enable prefix caching with block manager v2 enabled (#4142 ) Co-authored-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Sage Moore <sagemoore@utexas.edu>	2024-05-01 11:20:32 -07:00
Pastel！	a822eb3413	[Misc] fix typo in block manager (#4453 )	2024-04-30 20:41:32 -07:00
Ronen Schaffer	bf480c5302	Add more Prometheus metrics (#2764 ) Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-04-28 15:59:33 -07:00
Caio Mendes	3da24c2df7	[Model] Phi-3 4k sliding window temp. fix (#4380 )	2024-04-27 18:08:15 +08:00
SangBin Cho	603ad84815	[Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309 )	2024-04-26 13:02:02 +00:00
SangBin Cho	a88081bf76	[CI] Disable non-lazy string operation on logging (#4326 ) Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>	2024-04-26 00:16:58 -07:00
SangBin Cho	050f285ff6	[Core] Scheduling optimization 2 (#4280 )	2024-04-23 08:02:11 +00:00
SangBin Cho	0ae11f78ab	[Mypy] Part 3 fix typing for nested directories for most of directory (#4161 )	2024-04-22 21:32:44 -07:00
SangBin Cho	ad8d696a99	[Core] Scheduler perf fix (#4270 )	2024-04-22 21:11:06 +00:00
Cade Daniel	e95cd87959	[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894 )	2024-04-16 13:09:21 -07:00
SangBin Cho	37e84a403d	[Typing] Fix Sequence type GenericAlias only available after Python 3.9. (#4092 )	2024-04-15 14:47:31 -07:00
SangBin Cho	09473ee41c	[mypy] Add mypy type annotation part 1 (#4006 )	2024-04-12 14:35:50 -07:00
Zhuohan Li	d4ec9ffb95	[Misc] Fix typo in scheduler.py (#4022 )	2024-04-12 13:56:04 -07:00
Michael Feil	c2b4a1bce9	[Doc] Add typing hints / mypy types cleanup (#3816 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-11 17:17:21 -07:00
SangBin Cho	67b4221a61	[Core][5/N] Fully working chunked prefill e2e (#3884 )	2024-04-10 17:56:48 -07:00
youkaichao	2f19283549	[Core] latency optimization (#3890 )	2024-04-06 19:14:06 -07:00
SangBin Cho	18de883489	[Chunked Prefill][4/n] Chunked prefill scheduler. (#3853 )	2024-04-05 10:17:58 -07:00
SangBin Cho	3dcb3e8b98	[3/N] Refactor scheduler for chunked prefill scheduling (#3550 )	2024-04-03 14:13:49 -07:00
Michael Goin	b321d4881b	[Bugfix] Add `__init__.py` files for `vllm/core/block/` and `vllm/spec_decode/` (#3798 )	2024-04-02 12:35:31 -07:00
Cade Daniel	93deb0b38f	[Speculative decoding 4/9] Lookahead scheduling for speculative decoding (#3250 )	2024-04-01 22:55:24 +00:00
Simon Mo	4716a32dd4	fix logging msg for block manager (#3701 )	2024-03-28 23:29:55 +00:00
SangBin Cho	b51c1cc9d2	[2/N] Chunked prefill data update (#3538 )	2024-03-28 10:06:01 -07:00
Cade Daniel	14ccd94c89	[Core][Bugfix]Refactor block manager for better testability (#3492 )	2024-03-27 23:59:28 -07:00
xwjiang2010	64172a976c	[Feature] Add vision language model support. (#3042 )	2024-03-25 14:16:30 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
TianYu GUO	e67c295b0c	[Bugfix] fix automatic prefix args and add log info (#3608 )	2024-03-25 05:35:22 -07:00
Thomas Parnell	cf2f084d56	Dynamic scheduler delay to improve ITL performance (#3279 ) Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>	2024-03-22 12:28:14 -07:00
ElizaWszola	6ebd02bdef	[PREFIX CACHING FOLLOW UP] OrderedDict-based evictor (#3431 ) Co-authored-by: rsnm2 <rshaw@neuralmagic.com> Co-authored-by: Luka <luka@paperspace>	2024-03-20 23:20:04 -07:00
SangBin Cho	6e435de766	[1/n][Chunked Prefill] Refactor input query shapes (#3236 )	2024-03-20 14:46:05 -07:00
ElizaWszola	9474e89ba4	[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-03-20 00:11:11 -07:00
Tao He	14b8ae02e7	Fixes the misuse/mixuse of time.time()/time.monotonic() (#3220 ) Signed-off-by: Tao He <sighingnow@gmail.com> Co-authored-by: simon-mo <simon.mo@hey.com>	2024-03-15 18:25:43 +00:00
Breno Faria	49a3c8662b	Fixes #1556 double free (#3347 )	2024-03-13 00:30:08 +00:00
Zhuohan Li	2f8844ba08	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
ElizaWszola	b35cc93420	Fix auto prefix bug (#3239 )	2024-03-07 16:37:28 -08:00
Nick Hill	8999ec3c16	Store `eos_token_id` in `Sequence` for easy access (#3166 )	2024-03-05 15:35:43 -08:00
Zhuohan Li	996d095c54	[FIX] Fix styles in automatic prefix caching & add a automatic prefix caching benchmark (#3158 )	2024-03-03 14:37:18 -08:00
Sage Moore	ce4f5a29fb	Add Automatic Prefix Caching (#2762 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-03-02 00:50:01 -08:00
Massimiliano Pronesti	93dc5a2870	chore(vllm): codespell for spell checking (#2820 )	2024-02-21 18:56:01 -08:00
Nick Hill	7d2dcce175	Support per-request seed (#2514 )	2024-02-21 11:47:00 -08:00
Antoni Baum	017d9f1515	Add metrics to RequestOutput (#2876 )	2024-02-20 21:55:57 -08:00
Antoni Baum	9b945daaf1	[Experimental] Add multi-LoRA support (#1804 ) Co-authored-by: Chen Shen <scv119@gmail.com> Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by: Avnish Narayan <avnish@anyscale.com>	2024-01-23 15:26:37 -08:00
Nick Hill	d75c40734a	[Fix] Keep `scheduler.running` as deque (#2523 )	2024-01-20 22:36:09 -08:00
ljss	d2a68364c4	[BugFix] Fix abort_seq_group (#2463 )	2024-01-18 15:10:42 -08:00
shiyi.c_98	d10f8e1d43	[Experimental] Prefix Caching Support (#1669 ) Co-authored-by: DouHappy <2278958187@qq.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-01-17 16:32:10 -08:00
陈序	48cf1e413c	fix: deque mutated during iteration in abort_seq_group (#2371 )	2024-01-12 17:44:18 +01:00
Jiaxiang	6549aef245	[DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011 )	2024-01-11 19:26:49 -08:00
Nadav Shmayovits	05921a9a7a	Changed scheduler to use deques instead of lists (#2290 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-01-07 09:48:07 -08:00
Woosuk Kwon	a1b9cb2a34	[BugFix] Fix recovery logic for sequence group (#2186 )	2023-12-20 21:52:37 -08:00
Zhuohan Li	1cb4ad8de9	[FIX] Fix formatting error	2023-11-29 00:40:19 +00:00

1 2 3

124 Commits