xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-27 09:55:14 +08:00

Author	SHA1	Message	Date
Lucia Fang	b316ac6589	[V1] Support MP Executor for multi node distributed inference (#23691 ) Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-16 09:01:21 +00:00
Chendi.Xue	c9e665852a	[NIXL] heterogeneous block_size support (#26759 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-11-14 21:51:32 -08:00
Nicolò Lucchesi	96b23b8e3b	[Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-14 22:40:05 +08:00
Nick Hill	bc3e43069a	[BugFix] Fix multi-modal async scheduling race condition (#28706 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-14 01:11:13 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4ca5cd5740	[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695 ) Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-11-12 15:24:12 -08:00
Yihua Cheng	94a9ebcf31	[KV connector][WIP] KV cache proxy based on LMCache multi-process mode (#27902 ) Signed-off-by: ApostaC <yihua98@uchicago.edu>	2025-11-12 20:25:43 +00:00
ZhengHongming888	c5f10cc139	add cpu option for p/d in nixl_connector (#28356 ) Signed-off-by: Hongming Zheng <hongming.zheng@intel.com>	2025-11-12 11:53:08 +00:00
ziruiliu	d143152308	[KVConnector] Enable get_block_ids_with_load_errors() in LMCache connector (#27978 ) Signed-off-by: Zirui Liu <ziliu@ddn.com> Signed-off-by: ziruiliu <ziliu@ddn.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-12 11:44:58 +01:00
Chenguang Zheng	91864b79b3	[CI/Build] Fix crash due to removed VLLM_USE_V1 attribute in EPD (#28521 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-11 23:09:33 -08:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Ilya Markov	1788aa1efb	[BugFix] Graceful handling of torch symm mem errors. (#27671 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-11 17:41:54 -07:00
Nicolò Lucchesi	a7ef3eb0cd	[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282 )	2025-11-11 16:57:43 +00:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Sage Moore	798c7bebca	[EPLB] Refactor balance_packing to use numpy and optimize GPU-CPU transfers in EPLB (#28369 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-11-11 00:19:51 -08:00
David Ben-David	cc079763c5	[BugFix] Avoid calling KV connector layer APIs when metadata is unset (#28253 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 23:39:36 -08:00
Jialin Ouyang	b30372cbd0	[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-10 15:34:18 -08:00
Nick Hill	289eb6c537	[Core] Simplify async KV output aggregation (#28327 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-09 09:44:13 -08:00
Nick Hill	67a2da890e	[PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 22:11:03 +00:00
Nicolò Lucchesi	68a72a5cc1	Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012 )" (#28289 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-07 15:07:01 +00:00
Boyuan Feng	0f872b7977	[Log] update shm wait time msg (#28255 )	2025-11-07 09:43:30 -05:00
Samuel Shen	40db194446	[CI]: Add LMCacheConnector Unit Tests (#27852 ) Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>	2025-11-05 09:45:57 -08:00
Ilya Markov	e50c454672	[BugFix] Support EP/DP + EPLB with MTP (#25311 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-11-05 15:22:17 +00:00
wangxiyuan	428bc7bf1c	[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-04 20:51:16 -08:00
Nick Hill	c9f66da8fd	[PerfFix] Avoid separate thread for MP executor shm spin (#28012 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-04 08:33:55 -08:00
bnellnm	938772af03	[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123 )	2025-11-04 21:59:45 +08:00
Mark McLoughlin	58279c60b5	[KV Connector] Make KVCacheConfig an explicit constructor argument (#27887 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-03 23:00:49 -08:00
Yue Zhang	685c99ee77	[KV offload] Offloading connector async scheduling support (#27648 ) Signed-off-by: KevinCheung2259 <2651309292@qq.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-01 21:08:56 +00:00
Nick Hill	0cdbe7b744	[Core] Async scheduling + structured outputs compatibility (#26866 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 00:35:04 +00:00
GuanLuo	d6517be3cd	[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node (#26338 ) Signed-off-by: Guan Luo <gluo@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-10-31 10:16:00 -07:00
Wentao Ye	a8141fa649	[Refactor] Remove `VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK` (#27750 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-30 15:32:39 -04:00
Ilya Markov	60f76baa66	[Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices (#27564 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-10-30 11:41:44 -04:00
Nick Hill	2ce5c5d3d6	[BugFix] Handle unscheduled requests properly when async scheduling (#27756 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-29 21:04:25 -07:00
Nick Hill	d4aa144343	[BugFix] Fix handling of resumed reqs in `SharedStorageConnector` (#27719 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-29 20:16:52 +00:00
Nicolò Lucchesi	accb8fab07	[KVConnector] Add metrics to Prometheus-Grafana dashboard (#26811 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-10-29 18:44:49 +00:00
Wentao Ye	5522fb274b	[Chore] Optimize P2PNCCLEngine `http_address` (#27488 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-30 00:05:09 +08:00
Shaoting	a4a4f0f617	[KV Connector] Update lmcache connector with latest compatibility (#27681 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-10-29 05:38:37 +00:00
Kero Liang	02af36df36	[Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer (#27117 ) Signed-off-by: Kero Liang <kerorek@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: donglu <donglu@cohere.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-28 15:01:24 +00:00
Samuel Shen	05e034f085	[nit]: Fix import for the lmcache integration (#27600 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-10-28 14:40:55 +00:00
Cyrus Leung	7c2bdb83dc	[Misc] Clean up utils (#27552 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-27 09:05:40 +00:00
Kuntai Du	b853540388	[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector (#25712 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-10-24 23:34:18 -07:00
Zhuohan Li	56ed7609a9	Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… (#27502 )	2025-10-25 05:31:43 +00:00
Yihua Cheng	83f478bb19	[KVConnector] Migrate the LMCache integration code to be vLLM native (#25542 ) Signed-off-by: ApostaC <yihua98@uchicago.edu>	2025-10-25 00:23:53 +00:00
Wentao Ye	52efc34ebf	[Log] Optimize Startup Log (#26740 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-24 19:27:04 -04:00
Pengchao Wang	d95d0f4b98	[Distributed] Basic set of configuration for large EP deployment on GB200 (#27328 ) Signed-off-by: Pengchao Wang <wpc@fb.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2025-10-24 14:16:44 -07:00
kourosh hakhamaneshi	7e1d697b56	[Bugfix] Fix MultiConnector stats reconstruction across process boundaries (#27366 ) Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-10-24 17:08:05 +00:00
Chendi.Xue	699d62e6cf	[NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished (#27297 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-10-24 17:01:41 +00:00
Rui Qiao	09a6a49eaf	[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator (#27443 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-24 14:53:09 +08:00
usberkeley	c528b9006a	Fix EventPublisherFactory logic for disabled KV cache events (#27419 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-10-24 05:00:01 +00:00
Ilya Markov	237cf6d32a	[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) (#26709 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-10-23 20:58:39 +08:00
dongbo910220	a0003b56b0	[Chore] Separate out system utilities from vllm.utils (#27201 ) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-22 20:25:25 +00:00

1 2 3 4 5 ...

455 Commits