xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-10 10:40:44 +08:00

Author	SHA1	Message	Date
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
usberkeley	81fe3f82af	[BugFix] Fix index error in ngram_proposer (#29779 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-12-02 04:48:11 +00:00
Benjamin Chislett	1986de1375	[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-28 22:25:05 +00:00
rasmith	3999442f1c	[CI/Build][AMD] Add check for flash_att_varlen_func to test_tree_attention.py (#29252 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-23 04:45:08 +00:00
Jialin Ouyang	30b9c67743	Revert "[Redo] #26368 (#28771 )" (#29121 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 21:27:45 -08:00
Eldar Kurtić	e439c784fa	Add support for Eagle with separate lm-head and embed_tokens layers (#28549 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-15 06:12:02 -08:00
Cyrus Leung	98b4d389ed	[Redo] #26368 (#28771 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 22:47:41 -08:00
Nick Hill	ac86bff8cb	Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773 )	2025-11-14 20:24:00 -08:00
Jialin Ouyang	186352b270	[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 16:04:04 -08:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Andy Lo	47604137a2	[Bugfix] Spec decode + structured output + spec model max len edge case (#28298 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-08 19:44:25 +00:00
Dipika Sikka	413ef7a3b4	[Speculators] Move tests + fix integration (#27308 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: rahul-tuli <rtuli@redhat.com> Co-authored-by: Rahul Tuli <rtuli@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-10-29 00:54:21 -07:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Matthew Bonanni	76879cc160	[Attention] Implement universal BACKEND_MAP (#25900 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-08 12:00:25 -07:00
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Matthew Bonanni	2aaa423842	[Attention] Move Backend enum into registry (#25893 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-02 20:32:24 -07:00
Lucia Fang	001e50c92c	[Model] MTP fallback to eager for DeepSeek v32 (#25982 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-10-01 01:53:22 +00:00
qizixi	c70ac4b8ff	[spec decode] Consolidate speculative decode method name for MTP (#25232 ) Signed-off-by: zixi-qi <qizixi@meta.com>	2025-09-26 22:27:05 +00:00
Ekagra Ranjan	e71b8e210d	[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-25 15:22:03 -07:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Cyrus Leung	2f17117606	[mypy] Fix wrong type annotations related to tuple (#25660 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 13:00:45 +00:00
jiahanc	d5944d5146	[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-09-23 15:44:35 -04:00
Lucas Wilkinson	cc1dc7ed6d	[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-23 16:02:10 +00:00
Benjamin Chislett	b7433ca1a4	[Spec Decode] Efficient padded speculation (#24539 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-09-18 01:07:24 -04:00
Sage Moore	567939953b	[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-16 12:21:48 -04:00
Harry Mellor	f36355abfd	Move `LoadConfig` from `config/__init__.py` to `config/load.py` (#24566 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-10 06:14:18 -07:00
Didier Durand	d7e1e59972	[Doc]: fix typos in Python comments (#24093 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 21:05:45 -07:00
Didier Durand	fad73be1a5	[Doc]: fix typos in Python comments (#24077 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 02:38:55 -07:00
Woosuk Kwon	d6d13bd49e	[Misc] Add max_seq_len to CommonAttentionMetadata (#23216 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-20 09:05:29 -07:00
Jialin Ouyang	31a500c86f	[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP (#22437 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-08-13 14:44:06 -07:00
Giancarlo Delfin	d94e3026de	[V1] Add tree drafting tests for eagle spec decoding (#22705 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-13 04:11:28 -07:00
TJian	65abe111a3	[CI] Skip Tree Attn Test in `test_max_len.py` to unblock CI (#22664 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-11 10:36:05 -07:00
TJian	1ee5ead5f8	[ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine (#21496 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-07 19:13:17 -07:00
Harry Mellor	7e3a8dc906	Remove `from_dict` from `SpeculativeConfig` (#22451 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-07 10:13:04 -07:00
Lucas Wilkinson	1dc8a70b6d	[Attention] Support multiple attention metadata builders per kv_cache_spec + proper local attention no hybrid kv cache fix (#21588 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-08-06 18:40:52 -07:00
Giancarlo Delfin	5ea71ff46f	[V1] reduce block size for tree attention correctness test to fix 'ou… (#22207 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-04 19:11:06 -07:00
Giancarlo Delfin	aa7012eb6d	Add tree attention backend for v1 (part 1) (#20401 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-03 22:13:26 -07:00
Chen Zhang	555e7225bc	[v1][attention] Support Hybrid Allocator + FlashInfer (#21412 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-07-30 01:45:29 +00:00
Cyrus Leung	86ae693f20	[Deprecation][2/N] Replace `--task` with `--runner` and `--convert` (#21470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-27 19:42:40 -07:00
Lucas Wilkinson	76b494444f	[Attention] Refactor attention metadata builder interface (#20466 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-07-17 04:44:25 +00:00
Liangliang Ma	a0389e0554	[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169 ) Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>	2025-07-02 09:06:04 +08:00
Benjamin Chislett	3465b87ef8	[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-06-05 19:10:08 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
qizixi	c1e4a4052d	[V1][Spec Decode] Support multi-layer eagle draft model (#18030 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 09:45:34 +00:00
qizixi	d55e446d13	[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 06:51:22 +00:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
Lucia Fang	8795eb9975	[Bugfix] Fix test_eagle test (#18223 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 15:59:42 -07:00
wwl2755	dc9905368d	[V1][Spec Decode] Eagle unit tests (#17350 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-05-12 23:01:17 +00:00
Woosuk Kwon	3a0fba5cf4	[V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-21 12:38:50 -07:00

1 2

54 Commits