xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-14 19:25:01 +08:00

Author	SHA1	Message	Date
Benjamin Chislett	cee182b297	[Perf][V1] Fully overlap model execution (#23569 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-09-05 18:20:17 -07:00
bnellnm	e9b92dcd89	[Kernels] Overlap shared experts with send/recv (#23273 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-03 12:35:18 -04:00
Wentao Ye	98aee612aa	[Log] Only Print Profiler Results on Rank 0 (#23370 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-02 18:53:34 +00:00
Didier Durand	9701352e4b	[Doc]: fix typos in Python comments (#24001 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-31 08:21:59 +00:00
Andy Lo	038e9be4eb	[LoRA] Much faster startup when LoRA is enabled (#23777 ) Signed-off-by: Andy Lo <andy@mistral.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-30 15:37:39 +00:00
Nick Hill	d90d8eb674	[BugFix] Async scheduling and PP compatibility with DP (#23770 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-29 08:17:27 -07:00
weiliang	ae067888d6	Update Flashinfer to 0.2.14.post1 (#23537 ) Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Signed-off-by: siyuanf <siyuanf@nvidia.com> Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-25 18:30:44 -07:00
Chaojun Zhang	8a044754bd	[XPU] Delay BF16 check to worker init for spawn compatibility (#22979 ) Signed-off-by: chzhang <chaojun.zhang@intel.com>	2025-08-25 13:09:26 -07:00
22quinn	2a167b2eeb	[test][RL] Add sleep level 2 test and fix reload with sleep mode (#23521 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-26 00:25:52 +08:00
Ning Xie	325aa3dee9	[Misc] local import code clean (#23420 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-22 14:01:35 +00:00
rongfu.leng	4fbda0b20c	[Feature] use --eplb_config to set eplb param (#20562 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-20 14:07:28 -07:00
Woosuk Kwon	c9b38be8aa	[Spec Decode] Make `propose_draft_token_ids` non-blocking for lower TTFT (#23041 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-18 17:20:38 -07:00
fhl2000	74f441f4b5	[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-08-15 10:01:39 -04:00
nvjullin	279a5f31b3	[Kernel] Add nvfp4 gemm flashinfer backends (#22346 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-14 16:03:55 -04:00
Varun Sundar Rabindranath	f703b923f3	[Misc] DeepGEMM : Avoid JIT generation in the hot-path (#22215 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-08 16:09:59 -07:00
David Ben-David	aefeea0fde	[V1] [P/D] Refactor KV Connector Path (#21980 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-08-03 04:03:40 -07:00
fhl2000	23322431c8	[V1][CUDA] Full cudagraph support for FlashInfer (#21367 )	2025-08-01 21:49:34 -04:00
Csrayz	b917da442b	Expose PyTorch profiler configuration to environment variables (#21803 ) Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>	2025-07-29 19:46:31 -07:00
Kuntai Du	b18b417fbf	Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" (#21778 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-07-28 20:15:18 +00:00
Adeline	15a72ac478	[V1] Exception Handling when Loading KV Cache from Remote Store (#21534 ) Signed-off-by: liuyumoye <adeline_ly2023@outlook.com> Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>	2025-07-27 20:34:17 -07:00
Ye (Charlotte) Qi	a40a8506df	[Misc] Improve memory profiling debug message (#21429 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-07-26 07:07:21 -07:00
Cyrus Leung	46d81d6951	[V1] Get supported tasks from model runner instead of model config (#21585 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-25 05:36:45 -07:00
Nick Hill	eec6942014	[BugFix] Fix KVConnector TP worker aggregation (#21473 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-23 20:56:49 -07:00
22quinn	5c9b807b34	[Core] Add `reload_weights` RPC method (#20096 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-07-23 14:24:52 -07:00
kourosh hakhamaneshi	9f414a12ad	[BugFix] Make PD work with Ray (#21072 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2025-07-19 08:46:50 -07:00
Rui Qiao	217937221b	Elastic Expert Parallel Initial Support (#20775 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-07-18 17:46:09 -07:00
Cyrus Leung	45badd05d0	[Core] Set pooling params based on task and model (#21128 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-18 05:41:17 -07:00
22quinn	8632e831ba	[Core] Add `update_config` RPC method (#20095 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-07-14 00:49:18 +00:00
Nick Hill	574ad60db9	[KVConnector] Always call connector `clear_metadata()` at end of step (#20756 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: David Ben-David <sdavidbd@gmail.com>	2025-07-10 22:37:27 +01:00
Or Ozeri	cc876d0f29	[KVConnector] Aggregate finished requests on the scheduler (#19555 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-07-10 09:22:18 +01:00
Nick Hill	59389c927b	[BugFix][CPU] Fix CPU worker dependency on cumem_allocator (#20696 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-10 14:24:20 +08:00
Kunshang Ji	0b407479ef	[misc]refactor `Platform.set_device` method (#20262 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-07-09 01:39:47 +00:00
Yang Yang	6e2c19ce22	[Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU (#19410 ) Signed-off-by: dbyoung18 <yang5.yang@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-07-07 04:32:32 +00:00
Bowen Wang	e9fd658a73	[Feature] Expert Parallelism Load Balancer (EPLB) (#18343 ) Signed-off-by: Bowen Wang <abmfy@icloud.com>	2025-06-26 15:30:21 -07:00
Maximilien de Bayser	799397ee4f	Support embedding models in V1 (#16188 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-18 21:36:33 -07:00
Isotr0py	1173804dca	[Bugfix] Fix TP inference for Flex attention backend (#19657 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-16 11:21:37 +00:00
Ye (Charlotte) Qi	cc867be19c	[V1] Reuse V0's memory_profiling util for gpu worker memory profiling (#19312 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-06-10 08:40:01 +08:00
Luka Govedič	2d8476e465	[BugFix][V1] Fix memory profiling bug (#18974 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-07 10:34:51 -07:00
Li, Jiang	4555143ea7	[CPU] V1 support for the CPU backend (#16441 )	2025-06-03 18:43:01 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Divakar Verma	774c5fde30	[V1] fix torch profiling for V1 offline scenarios (#18445 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-05-28 04:16:30 +00:00
youkaichao	6a7988c55b	Refactor pplx init logic to make it modular (prepare for deepep) (#18200 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-05-23 23:43:43 +08:00
Harry Mellor	a1fe24d961	Migrate docs from Sphinx to MkDocs (#18145 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 02:09:53 -07:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Lucia Fang	3d2779c29a	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Jee Jee Li	822de7fb94	[Misc] Split model loader (#17712 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-07 12:42:26 +08:00
Li, Jiang	a6fed02068	[V1][PP] Support PP for MultiprocExecutor (#14219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-05-06 07:58:05 -07:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Daniel Li	48cb2109b6	[V1] Move usage stats to worker and start logging TPU hardware (#16211 )	2025-04-25 14:06:01 -06:00

1 2

90 Commits