xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-24 07:11:18 +08:00

Author	SHA1	Message	Date
Thomas Parnell	6bd1dd9d26	[Kernel] [V1] Improved performance for V1 Triton (ROCm) backend (#14152 )	2025-03-06 07:39:16 -08:00
Lucas Wilkinson	f6bb18fd9a	[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-05 17:10:13 -08:00
Nick Hill	ac60dc7fe1	[V1][BugFix] Fix for mixed top_k batch (#14301 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Ye Cao <caoye.cao@alibaba-inc.com>	2025-03-05 20:43:04 +00:00
Vincent	a4f1ee35d6	Deprecate `best_of` Sampling Parameter in anticipation for vLLM V1 (#13997 ) Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-05 20:22:43 +00:00
Nick Hill	a32c8669ca	[V1][Minor] Remove obsolete FIXME comment (#14304 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-05 11:59:23 -08:00
Robert Shaw	257e200a25	[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-03-05 14:18:55 +00:00
Lu Fang	8d6cd32b7b	[Bugfix][V1] Fix allowed_token_ids for v1 Sampler (#14169 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-05 08:49:44 +00:00
Roger Wang	ec79b67c77	[Misc][V1] Avoid using `envs.VLLM_USE_V1` in mm processing (#14256 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-05 07:37:16 +00:00
Tyler Michael Smith	72c62eae5f	[V1] EP/TP MoE + DP Attention (#13931 )	2025-03-04 21:27:26 -08:00
Cody Yu	ade3f7d988	[V1][Bugfix] Do not reset prefix caching metrics (#14235 )	2025-03-05 04:39:13 +00:00
Michael Goin	fbfc3ee37e	[V1][TPU] TPU multimodal model support for ragged attention (#14158 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-03-04 19:58:48 -05:00
Siyuan Liu	beebf4742a	[TPU][Profiler] Support start_profile/stop_profile in TPU worker (#13988 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-03-04 14:40:06 -05:00
Nick Hill	5db6b2c961	[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs (#13869 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-04 15:06:47 +00:00
iefgnoix	79e4937c65	[v1] Add comments to the new ragged paged attention Pallas kernel (#14155 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-03-03 23:00:55 +00:00
Mark McLoughlin	ae122b1cbd	[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 19:04:45 +00:00
Nick Hill	872db2be0e	[V1] Simplify stats logging (#14082 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-03 10:34:14 -08:00
Mark McLoughlin	4167252eaf	[V1] Refactor parallel sampling support (#13774 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 08:15:27 -08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Jun Duan	82fbeae92b	[Misc] Accurately capture the time of loading weights (#14063 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-03-01 17:20:30 -08:00
Chen Zhang	d54990da47	[v1] Add `__repr__` to KVCacheBlock to avoid recursive print (#14081 )	2025-03-01 20:46:02 +00:00
Chen Zhang	b9f1d4294e	[v1][Bugfix] Only cache blocks that are not in the prefix cache (#14073 )	2025-03-01 08:25:54 +00:00
Sage Moore	b28246f6ff	[ROCm][V1][Bugfix] Add get_builder_cls method to the ROCmAttentionBackend class (#14065 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-03-01 07:18:32 +00:00
Li, Jiang	02296f420d	[Bugfix][V1][Minor] Fix shutting_down flag checking in V1 MultiprocExecutor (#14053 )	2025-02-28 22:31:01 -08:00
Chen Zhang	28943d36ce	[v1] Move block pool operations to a separate class (#13973 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-28 20:53:31 +00:00
Chen Zhang	e7bd944e08	[v1] Cleanup the BlockTable in InputBatch (#13977 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-02-28 19:03:16 +00:00
iefgnoix	c3b6559a10	[V1][TPU] Integrate the new ragged paged attention kernel with vLLM v1 on TPU (#13379 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-02-28 11:01:36 -07:00
Lucas Wilkinson	2e94b9cfbb	[Attention] Flash MLA for V1 (#13867 ) Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Yang Chen <yangche@fb.com>	2025-02-27 23:03:41 +00:00
Woosuk Kwon	cd813c6d4d	[V1][Minor] Minor cleanup for GPU Model Runner (#13983 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-27 13:11:40 -08:00
Yang Chen	58d1b2aa77	[Attention] MLA support for V1 (#13789 ) Signed-off-by: Yang Chen <yangche@fb.com>	2025-02-27 13:14:17 -05:00
Mark McLoughlin	cd711c48b2	[V1][Metrics] Handle preemptions (#13169 )	2025-02-26 20:04:59 -08:00
Lily Liu	5629f26df7	[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729 )	2025-02-25 18:14:48 -08:00
Varun Sundar Rabindranath	03f48b3db6	[Core] LoRA V1 - Add add/pin/list/remove_lora functions (#13705 )	2025-02-25 00:18:02 -08:00
Mark McLoughlin	bc32bc73aa	[V1][Metrics] Implement vllm:lora_requests_info metric (#13504 )	2025-02-24 20:01:33 -08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Roger Wang	227578480d	Revert "[V1][Core] Fix memory issue with logits & sampling" (#13775 )	2025-02-24 09:16:05 -08:00
afeldman-nm	befc402d34	[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-24 08:29:41 -08:00
Roger Wang	437b76ff59	[V1][Core] Fix memory issue with logits & sampling (#13721 )	2025-02-24 06:10:06 -08:00
Nick Hill	cbae7af552	[V1][BugFix] Fix engine core client shutdown hangs (#13298 ) Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method. Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-23 13:07:43 -08:00
youkaichao	eb24dc4a45	[v1] torchrun compatibility (#13642 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-23 22:47:24 +08:00
Sage Moore	558db8083c	[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths (#13095 )	2025-02-22 05:25:41 -08:00
youkaichao	2382ad29d1	[ci] fix linter (#13701 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 20:28:59 +08:00
youkaichao	3e472d882a	[core] set up data parallel communication (#13591 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 19:28:59 +08:00
Cyrus Leung	7f6bae561c	[CI/Build] Fix pre-commit errors (#13696 )	2025-02-22 00:31:26 -08:00
Mark McLoughlin	2cb8c1540e	[Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295 )	2025-02-22 00:20:45 -08:00
Mark McLoughlin	1cd981da4f	[V1][Metrics] Support `vllm:cache_config_info` (#13299 )	2025-02-22 00:20:00 -08:00
Jennifer Zhao	da31b5333e	[Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler (#13594 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-02-22 00:08:29 -08:00
Lu Fang	bb78fb318e	[v1] Support allowed_token_ids in v1 Sampler (#13210 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-22 14:13:05 +08:00
Jun Duan	68d535ef44	[Misc] Capture and log the time of loading weights (#13666 )	2025-02-21 22:06:34 -08:00
Lucas Wilkinson	288cc6c234	[Attention] MLA with chunked prefill (#12639 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Patrick Horn <patrick.horn@gmail.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-21 15:30:12 -08:00

1 2 3 4 5 ...

279 Commits