xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-12 15:06:02 +08:00

Author	SHA1	Message	Date
Hyogeun Oh (오효근)	4e4d017b6f	[Docs] Fix warnings in `mkdocs build` (continued) (#23743 ) Signed-off-by: Zerohertz <ohg3417@gmail.com> Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>	2025-08-27 17:17:29 +00:00
Chenguang Zheng	d765cf01fe	[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests (#22711 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-08-25 00:41:17 -07:00
Nick Hill	c80c53a30f	[BugFix] Fix batch updates for pooling models (#23398 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-23 08:20:41 +08:00
Nick Hill	603fbbbce0	[Misc] Misc code cleanup/simplification (#23304 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-21 17:22:55 +00:00
wang.yuqi	d70a16625d	[Performance] V1 Pooling Models E2E Performance Optimization (#23162 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-21 13:26:09 +00:00
Cyrus Leung	27e8d1ea3e	[Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs (#23053 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-18 09:52:00 +00:00
Cyrus Leung	5c32143b9d	[Refactor] Defer tensor data construction in MultiModalKwargs (#23030 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-16 21:05:50 -07:00
afeldman-nm	bf7f470b22	[V1] Logits processors extensibility (#19912 ) Signed-off-by: Andrew Feldman <afeldman@redhat.com> Signed-off-by: Andrew Feldman <afeld2012@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-16 12:59:17 -07:00
Cyrus Leung	19b927e52d	[Core] Use individual MM items in P0/P1 cache and model runner (#22570 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 07:18:07 -07:00
22quinn	54de71d0df	[Sampler] Support returning all logprobs or logits (#21792 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-04 03:04:12 -07:00
Lu Fang	accac82928	[Sampler] Introduce logprobs mode for logging (#21398 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-07-23 01:39:25 -07:00
Cyrus Leung	45badd05d0	[Core] Set pooling params based on task and model (#21128 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-18 05:41:17 -07:00
afeldman-nm	48fb076cbc	[V1] LogitsProcessor programming model (#16728 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-07-02 09:10:42 -07:00
Isotr0py	61f4fc5dc6	[Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 (#19956 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-23 18:38:06 +00:00
Maximilien de Bayser	799397ee4f	Support embedding models in V1 (#16188 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-18 21:36:33 -07:00
afeldman-nm	19a53b2783	[V1] Decouple GPU and TPU `InputBatch` (#19778 ) Signed-off-by: Andrew Feldman <afeldman@redhat.com>	2025-06-18 06:38:13 +00:00
Nick Hill	646d62f636	[Core] Use tuple for kv cache group block ids (#19175 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-10 07:01:17 +02:00
Chen Zhang	6cac54f4d1	[v1] Re-init input batch for multiple kv cache groups (#18654 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 21:41:36 +00:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Chen Zhang	6550114c9c	[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945 )" (#18593 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-23 09:39:47 -07:00
Mark McLoughlin	bb0a311213	Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945 ) (#18459 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-21 10:25:23 -07:00
Chen Zhang	e60f550b38	[v1] Support multiple KV cache groups in GPU model runner (#17945 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-14 18:54:54 -07:00
Chen Zhang	950751a987	[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:12:04 -07:00
Nick Hill	df6f3ce883	[Core] Remove prompt string from engine core data structures (#17214 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 23:41:05 -07:00
Woosuk Kwon	e75a6301bd	[V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-01 12:33:16 -07:00
Cyrus Leung	c6bc0034d0	[Misc] Remove unused utils and clean up imports (#15708 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-28 09:41:16 -07:00
Nick Hill	3aee6573dc	[V1] Aggregate chunked prompt logprobs in model runner (#14875 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-24 12:27:57 -04:00
Nick Hill	fc1f67715d	[BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-16 14:53:34 -07:00
22quinn	eb8b5eb183	[V1] Support bad_words in sampler (#13376 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-08 14:50:26 -08:00
Lucas Wilkinson	f6bb18fd9a	[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-05 17:10:13 -08:00
Nick Hill	ac60dc7fe1	[V1][BugFix] Fix for mixed top_k batch (#14301 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Ye Cao <caoye.cao@alibaba-inc.com>	2025-03-05 20:43:04 +00:00
Nick Hill	a32c8669ca	[V1][Minor] Remove obsolete FIXME comment (#14304 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-05 11:59:23 -08:00
Robert Shaw	257e200a25	[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-03-05 14:18:55 +00:00
Lu Fang	8d6cd32b7b	[Bugfix][V1] Fix allowed_token_ids for v1 Sampler (#14169 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-05 08:49:44 +00:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Chen Zhang	e7bd944e08	[v1] Cleanup the BlockTable in InputBatch (#13977 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-02-28 19:03:16 +00:00
Yang Chen	58d1b2aa77	[Attention] MLA support for V1 (#13789 ) Signed-off-by: Yang Chen <yangche@fb.com>	2025-02-27 13:14:17 -05:00
Lily Liu	5629f26df7	[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729 )	2025-02-25 18:14:48 -08:00
Lu Fang	bb78fb318e	[v1] Support allowed_token_ids in v1 Sampler (#13210 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-22 14:13:05 +08:00
Nick Hill	31aa045c11	[V1][Sampler] Avoid an operation during temperature application (#13587 )	2025-02-20 22:05:56 -08:00
Nick Hill	30172b4947	[V1] Optimize handling of sampling metadata and req_ids list (#13244 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-18 12:15:33 -08:00
Woosuk Kwon	cd4a72a28d	[V1][Spec decode] Move drafter to model runner (#13363 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-17 15:40:12 -08:00
Lily Liu	80f63a3966	[V1][Spec Decode] Ngram Spec Decode (#12193 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-15 18:05:11 -08:00
Aoyu	a12934d3ec	[V1][Core] min_p sampling support (#13191 ) Signed-off-by: Aoyu <aoyuzhan@amazon.com> Co-authored-by: Aoyu <aoyuzhan@amazon.com>	2025-02-14 15:50:05 -08:00
Lu Fang	6224a9f620	Support logit_bias in v1 Sampler (#13079 )	2025-02-14 04:34:59 -08:00
afeldman-nm	0630d4537a	[V1] Logprobs and prompt logprobs support (#9880 ) This PR is adding support for sample logprobs & prompt logprobs to vLLM v1. New behavior: - During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order. - In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized. - During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.) - Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer. Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-07 07:26:20 -08:00
Varun Sundar Rabindranath	467a96a541	[V1] LoRA Support (#10957 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-06 09:32:51 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Roger Wang	81763c58a0	[V1] Add V1 support of Qwen2-VL (#12128 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: imkero <kerorek@outlook.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-19 19:52:13 +08:00
Woosuk Kwon	06bfb51963	[V1] Add BlockTable class (#11693 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-06 14:24:42 +09:00

1 2

57 Commits