xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-05 03:37:09 +08:00

Author	SHA1	Message	Date
Jevin Jiang	621ca2c0ab	[TPU] Increase block size and reset block shapes (#16458 )	2025-05-06 13:55:04 -04:00
Chen Zhang	aabcd2cae3	[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (#17479 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-06 08:50:34 -07:00
Chen Zhang	cba31c47c4	[v1] AttentionMetadata for each layer (#17394 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-06 07:58:37 -07:00
Li, Jiang	a6fed02068	[V1][PP] Support PP for MultiprocExecutor (#14219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-05-06 07:58:05 -07:00
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Nicolò Lucchesi	5941e0b7ea	[TPU][V1] Add support for top-logprobs (#17072 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-05-05 14:20:15 -07:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Lucas Wilkinson	0f87d8f7b2	[BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (#17574 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-02 11:01:38 -07:00
Robert Shaw	c777df79f7	[BugFix] Fix Memory Leak (#17567 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-02 01:07:03 -07:00
Lucas Wilkinson	afcb3f8863	[Attention] MLA move o_proj q_proj into cuda-graph region (#17484 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-02 03:16:26 +00:00
qizixi	39c0813a7f	[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 (#17504 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-01 16:19:30 -07:00
Chen Zhang	81ecf425f0	[v1][Spec Decode] Make sliding window compatible with eagle prefix caching (#17398 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-30 18:25:53 +00:00
Russell Bryant	947f2f5375	[V1] Allow turning off pickle fallback in vllm.v1.serial_utils (#17427 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-30 16:10:54 +00:00
Alec	0be6d05b5e	[V1][Metrics] add support for kv event publishing (#16750 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-04-30 07:44:45 -07:00
Marko Rosenmueller	77073c77bc	[Core] Prevent side-channel attacks via cache salting (#17045 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-04-30 20:27:21 +08:00
rongfu.leng	d803786731	[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None (#15755 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-30 18:20:39 +08:00
Gabriel Marinho	1c2bc7ead0	Truncation control for embedding models (#14776 ) Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-30 09:24:57 +08:00
Benjamin Chislett	34120f5acd	[V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-04-30 00:02:10 +00:00
Bryan Lu	70788bdbdc	[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (#17211 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-29 21:10:00 +00:00
Harry Mellor	a6977dbd15	Simplify (and fix) passing of guided decoding backend options (#17008 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:02:23 +00:00
Chen Zhang	24e6ad3f16	[V1] Remove num_input_tokens from attn_metadata (#17193 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-29 09:28:41 -07:00
Cyrus Leung	ebb3930d28	[Misc] Move config fields to MultiModalConfig (#17343 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 06:37:21 +00:00
Zhengyuan Su (苏政渊)	17eb306fcc	[Bugfix] Add contiguous call inside rope kernel wrapper (#17091 ) Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn> Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn>	2025-04-28 19:24:07 -07:00
Ekagra Ranjan	e136000595	[V1][Spec Decode] Make Eagle model arch config driven (#17323 )	2025-04-29 10:22:02 +08:00
Michał Moskal	86d9fc29cb	implement Structural Tag with Guidance backend (#17333 ) Signed-off-by: Michal Moskal <michal@moskal.me>	2025-04-29 02:21:32 +00:00
Lucas Wilkinson	cc5befbced	[BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) (#17283 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-28 13:55:50 -07:00
Lucas Wilkinson	d8bccde686	[BugFix] Fix vllm_flash_attn install issues (#17267 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-04-27 17:27:56 -07:00
Lily Liu	20e489eaa1	[V1][Spec Decode] Make eagle compatible with prefix caching. (#17137 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-27 09:29:43 -07:00
Cyrus Leung	4213475ec7	[Metrics] Fix minor inconsistencies in bucket progression (#17262 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-27 16:19:39 +00:00
cascade	690fe019f0	[Feature] support sequence parallelism using compilation pass (#16155 ) Signed-off-by: cascade812 <cascade812@outlook.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-04-27 06:29:35 -07:00
Flex Wang	18445edd0f	[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (#17033 ) Signed-off-by: sfc-gh-zhwang <flex.wang@snowflake.com>	2025-04-27 12:30:53 +00:00
Chen Zhang	838cedade7	[Bugfix] Get a specific type of layer from forward context (#17222 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-27 00:58:05 -07:00
Ning Xie	fd11a325b8	[MISC] rename interval to max_recent_requests (#14285 )	2025-04-26 16:59:18 +00:00
Ning Xie	dc2ceca5c5	[BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set (#17088 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-04-26 14:34:24 +00:00
Russell Bryant	f8acd01ff7	[V1] Add `structural_tag` support using xgrammar (#17085 )	2025-04-26 14:06:37 +00:00
Nick Hill	df6f3ce883	[Core] Remove prompt string from engine core data structures (#17214 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 23:41:05 -07:00
Nick Hill	b07bf83c7d	[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-26 06:00:07 +00:00
Zijing Liu	53e8cf53a4	[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:05:40 -07:00
Woosuk Kwon	1cf0719ebd	[Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 21:08:15 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
Daniel Li	48cb2109b6	[V1] Move usage stats to worker and start logging TPU hardware (#16211 )	2025-04-25 14:06:01 -06:00
Lu Fang	fc966e9cc6	Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158 )	2025-04-25 17:10:32 +08:00
Sangyeon Cho	6aae216b4e	[Bugfix] remove fallback in guided_json (int range, patterns) (#16725 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>	2025-04-25 06:54:43 +00:00
Yinghai Lu	fe92176321	Add collective_rpc to llm engine (#16999 ) Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-04-24 20:16:52 +00:00
Mark McLoughlin	340d7b1b21	[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-24 08:57:40 -07:00
Shanshan Shen	b724afe343	[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-24 06:15:03 -07:00
Harry Mellor	21f4f1c9a4	Improve static type checking in `LoRAModelRunnerMixin` (#17104 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 06:14:47 -07:00
Rui Qiao	c0dfd97519	[V1][PP] Optimization: continue scheduling prefill chunks (#17080 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 05:27:08 -07:00
Harry Mellor	0a05ed57e6	Simplify `TokenizerGroup` (#16790 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:43:56 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00

1 2 3 4 5 ...

563 Commits