xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-14 00:37:15 +08:00

Author	SHA1	Message	Date
shangmingc	50c9636d87	[V1][Usage] Refactor speculative decoding configuration and tests (#14434 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-03-22 19:28:10 -10:00
Bryan Lu	9ed6ee92d6	[Bugfix] EAGLE output norm bug (#14464 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-03-15 06:50:33 +00:00
pyc96	1e3e76b6cc	[Bugfix] Fix DeepSeek MTP crash when using TP1ModelRunner with CUDA graph due to shape mismatch (#14237 ) Signed-off-by: pyc96 <pychen96@gmail.com>	2025-03-05 22:22:40 +00:00
Benjamin Chislett	9804145cac	[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-02-27 15:28:08 -08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Simon Mo	8c755c3b6d	[bugfix] spec decode worker get tp group only when initialized (#13578 )	2025-02-20 04:46:28 +00:00
shangmingc	5ae9f26a5a	[Bugfix] Fix device ordinal for multi-node spec decode (#13269 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-19 22:13:15 +08:00
Lucia Fang	f525c0be8b	[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-19 17:06:23 +08:00
shangmingc	46cdd59577	[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-16 19:32:26 -08:00
Cyrus Leung	5d2965b7d7	[Bugfix] Fix 2 Node and Spec Decode tests (#13341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-16 22:20:22 +08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Mengqing Cao	dd66fd2b01	[CI] fix pre-commit error (#12494 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-01-28 06:11:05 +00:00
Harry Mellor	823ab79633	Update `pre-commit` hooks (#12475 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-27 17:23:08 -07:00
Nicolò Lucchesi	6116ca8cd7	[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and `prompt_logprobs` with ChunkedPrefill (#10132 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: wallashss <wallashss@ibm.com> Co-authored-by: wallashss <wallashss@ibm.com>	2025-01-27 13:38:35 -08:00
Cyrus Leung	59a0192fb9	[Core] Interface for accessing model from `VllmRunner` (#10353 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-20 15:00:59 +08:00
youkaichao	ad34c0df0f	[core] platform agnostic executor via collective_rpc (#11256 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-15 13:45:21 +08:00
Cyrus Leung	ee77fdb5de	[Doc][2/N] Reorganize Models and Usage sections (#11755 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 21:40:31 +08:00
youkaichao	b12e87f942	[platforms] enable platform plugins (#11602 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-30 20:24:45 +08:00
Rafael Vasquez	32aa2059ad	[Docs] Convert rST to MyST (Markdown) (#11145 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-12-23 22:35:38 +00:00
Cyrus Leung	c889d5888b	[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:20:49 +00:00
Cyrus Leung	aa39a8e175	[Doc] Create a new "Usage" section (#10827 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-05 11:19:35 +08:00
Yang Zheng	f6084f6324	[Speculative Decoding] Move indices to device before filtering output (#10850 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-12-03 17:01:39 +08:00
jeongin601	1bf905ddaa	[Bugfix][SpecDecode] apply sampling parameters to target probabilities for consistency in rejection sampling. (#10198 ) Signed-off-by: jeongin601 <0200angela@gmail.com> Signed-off-by: jeong_in.bae <jeong_in.bae@navercorp.com>	2024-11-27 05:07:30 +00:00
Chendi.Xue	0a71900bc9	Remove hard-dependencies of Speculative decode to CUDA workers (#10587 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2024-11-26 17:57:11 -08:00
Murali Andoorveedu	db66e018ea	[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Co-authored-by: Sourashis Roy <sroy@roblox.com>	2024-11-26 09:11:16 -08:00
youkaichao	eebad39f26	[torch.compile] support all attention backends (#10558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-22 14:04:42 -08:00
Sky Lee	2ec8827288	[Bugfix] Qwen-vl output is inconsistent in speculative decoding (#10350 )	2024-11-15 05:40:10 +00:00
Cyrus Leung	e0191a95d8	[0/N] Rename `MultiModalInputs` to `MultiModalKwargs` (#10040 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-09 11:31:02 +08:00
Nicolò Lucchesi	9d43afcc53	[Feature] [Spec decode]: Combine chunked prefill with speculative decoding (#9291 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2024-11-07 08:15:14 -08:00
Sungjae Lee	0c63c34f72	[Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode (#9730 ) Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2024-11-06 01:45:45 +00:00
youkaichao	2094062b4e	[4.5/N] bugfix for quant config in speculative decode (#10007 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-04 15:11:59 -08:00
youkaichao	e893795443	[2/N] executor pass the complete config to worker/modelrunner (#9938 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2024-11-02 07:35:05 -07:00
科英	67a6882da4	[Misc] SpecDecodeWorker supports profiling (#9719 ) Signed-off-by: Abatom <abatom@163.com>	2024-10-27 04:18:03 +00:00
Thomas Parnell	496e991da8	[Doc] Consistent naming of attention backends (#9498 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-10-21 22:29:57 +08:00
Lily Liu	8345045833	[Performance][Spec Decode] Optimize ngram lookup performance (#9333 )	2024-10-16 13:37:45 -06:00
Lily Liu	89feb4c84d	[SpecDec] Remove Batch Expansion (2/3) (#9298 )	2024-10-12 05:13:37 +00:00
Wallas Henrique	8baf85e4e9	[Doc] Compatibility matrix for mutual exclusive features (#8512 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-11 11:18:50 -07:00
TJian	23fea8714a	[Bugfix] Fix try-catch conditions to import correct Flash Attention Backend in Draft Model (#9101 )	2024-10-06 13:00:04 +08:00
youkaichao	9aaf14c62e	[misc] add forward context for attention (#9029 )	2024-10-03 12:09:42 -07:00
Lily Liu	1570203864	[Spec Decode] (1/2) Remove batch expansion (#8839 )	2024-10-01 16:04:42 -07:00
Travis Johnson	01b6f9e1f0	[Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-09-24 17:29:56 -07:00
Lily Liu	c6bd70d772	[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701 )	2024-09-22 12:34:14 -07:00
Aaron Pham	9d104b5beb	[CI/Build] Update Ruff version (#8469 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-18 11:00:56 +00:00
Kevin Lin	5faedf1b62	[Spec Decode] Move ops.advance_step to flash attn advance_step (#8224 )	2024-09-10 13:18:14 -07:00
Lily Liu	e6a26ed037	[SpecDecode][Kernel] Flashinfer Rejection Sampling (#7244 )	2024-09-01 21:23:29 -07:00
afeldman-nm	428dd1445e	[Core] Logprobs support in Multi-step (#7652 )	2024-08-29 19:19:08 -07:00
Jonas M. Kübler	f205c09854	[Bugfix] Unify rank computation across regular decoding and speculative decoding (#7899 )	2024-08-28 22:18:13 -07:00
Nick Hill	1856aff4d6	[Spec Decoding] Streamline batch expansion tensor manipulation (#7851 )	2024-08-25 15:45:14 -07:00

1 2 3

110 Commits