xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-13 17:47:15 +08:00

Author	SHA1	Message	Date
Chen Zhang	a8da78eac9	[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-04 00:14:06 +00:00
Chen Zhang	b5fd9506c1	[Bugfix] get_num_blocks_to_allocate with null_block (#19031 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 15:30:55 -07:00
Chen Zhang	6cac54f4d1	[v1] Re-init input batch for multiple kv cache groups (#18654 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 21:41:36 +00:00
Yong Hoon Shin	bdf13965ab	[V1] Support cross-layer KV sharing (#18212 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-06-03 20:33:07 +00:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Chen Zhang	f32fcd9444	[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 08:01:48 +00:00
Rui Qiao	bdce64f236	[V1] Support DP with Ray (#18779 )	2025-06-02 21:15:13 -07:00
Siyuan Liu	9112b443a0	[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-06-03 00:06:20 +00:00
22quinn	9760fd8f6a	[Core] Support inplace model weights loading (#18745 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-02 17:38:50 +08:00
Nick Hill	2dbe8c0774	[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )	2025-05-30 08:17:00 -07:00
Carol Zheng	fba02e3bd1	[Bugfix][TPU] Fix tpu model runner testcase failure (#18810 ) Signed-off-by: Carol Zheng <cazheng@google.com>	2025-05-30 18:04:03 +08:00
Nick Hill	d1d61f3351	[BugFix] Make DP work with connector-delayed new requests (#18559 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com>	2025-05-29 18:04:18 +00:00
Nicolò Lucchesi	32ce3cf7c9	[V1] Allocate kv_cache with stride order for V1 (#18775 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-05-29 17:54:16 +00:00
Mark McLoughlin	06a0338015	[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-27 09:37:06 +00:00
qizixi	c1e4a4052d	[V1][Spec Decode] Support multi-layer eagle draft model (#18030 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 09:45:34 +00:00
qizixi	d55e446d13	[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 06:51:22 +00:00
Robert Shaw	2b10ba7491	[Bugfix][Nixl] Fix Preemption Bug (#18631 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-23 23:30:16 +00:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Chen Zhang	6550114c9c	[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945 )" (#18593 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-23 09:39:47 -07:00
Chauncey	b046cf792d	[Feature][V1]: suupports cached_tokens in response usage (#18149 ) Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-05-23 01:41:03 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
rasmith	46791e1b4b	[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-22 18:45:35 -07:00
Harry Mellor	ca86a7cf6e	[CI/Build] Update bamba test model location (#18544 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 06:01:07 -07:00
Jee Jee Li	db5a29ba19	[Bugfix] Fix LoRA test (#18518 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-21 21:48:53 -07:00
Mark McLoughlin	bb0a311213	Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945 ) (#18459 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-21 10:25:23 -07:00
Bowen Wang	7fdfa01530	[Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-05-16 15:14:03 -07:00
Seiji Eicher	541817670c	[Misc] Add Ray Prometheus logger to V1 (#17925 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-05-16 01:02:42 -07:00
Lucia Fang	8795eb9975	[Bugfix] Fix test_eagle test (#18223 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 15:59:42 -07:00
David Xia	de71fec81b	[CI] don't skip fixed `test_kv_cache_events()` (#18183 ) Signed-off-by: David Xia <david@davidxia.com>	2025-05-14 23:17:16 -07:00
Ning Xie	420caf7557	[UT] Add ut for none hash (#17892 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-15 13:28:11 +08:00
Mark McLoughlin	65334ef3b9	[V1][Metrics] Remove unused code (#18158 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-14 20:13:17 -07:00
Chen Zhang	e60f550b38	[v1] Support multiple KV cache groups in GPU model runner (#17945 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-14 18:54:54 -07:00
Michael Goin	2142035b51	[V1] Support multiple kv connectors (#17564 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-05-14 16:28:02 -07:00
Russell Bryant	78aa341d12	[CI] Fix race condition in test_kv_cache_events test (#18169 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 16:27:48 -07:00
Aaron Pham	2fc9075b82	[V1] Structured Outputs + Thinking compatibility (#16577 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 15:45:24 -07:00
Robert Shaw	856865008e	[CI] Disable Failing Tests (#18165 )	2025-05-14 13:49:56 -07:00
Nick Hill	59dd311cf5	[KVConnector] Keep KVTransferParams as a dict (#18033 )	2025-05-14 08:05:57 -07:00
Chen Zhang	f2ae883b67	[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-13 19:09:39 -07:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Chen Zhang	f0d610a8ae	[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-13 06:50:38 +00:00
Chauncey	dc1a821768	[Feature][V1] Support `tool_choice: required` when using Xgrammar as the `StructuredOutputBackend`. (#17845 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-12 23:01:31 -07:00
wwl2755	dc9905368d	[V1][Spec Decode] Eagle unit tests (#17350 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-05-12 23:01:17 +00:00
Russell Bryant	ebab1ac37c	[CI] Make JSON output tests less likely to fail (#17859 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 22:31:54 +00:00
Robert Shaw	d19110204c	[P/D] NIXL Integration (#17751 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-12 09:46:16 -07:00
Cheng Kuan Yong Jason	08bf784078	[Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails (#17623 ) Signed-off-by: Jason Cheng <jasoncky96@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 09:06:10 +08:00
Chen Zhang	ca66a1674c	[v1] Rename specialized_manager.py to single_type_kv_cache_manager.py (#17946 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:14:12 -07:00
Chen Zhang	950751a987	[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:12:04 -07:00
Chen Zhang	200da9a517	[v1] Move block management logic from KVCacheManager to SpecializedManager (#17474 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-09 15:25:34 +00:00
Ning Xie	d310e6de98	[BUGFIX]: return fast when request requires prompt logprobs (#17251 )	2025-05-08 21:25:41 -07:00

1 2 3 4 5 ...

256 Commits