xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-30 11:37:14 +08:00

Author	SHA1	Message	Date
Cyrus Leung	f6137adbcb	Revert "[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) (#14892 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-16 09:13:46 -07:00
Kyle Sayers	d30aa7e9e6	[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-03-16 07:44:19 -07:00
Lucas Wilkinson	5952d8ab61	[Attention] Get rid of mla cache alignment (#14842 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-15 05:08:25 +00:00
Li, Jiang	a2ae496589	[CPU] Support FP8 KV cache (#14741 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-03-14 22:07:36 -07:00
yarongmu-google	dd344e0342	[Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … (#14844 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-15 00:41:15 +00:00
Jee Jee Li	b8b0ccbd2d	[Bugfix] Make the deviceprofiler include LoRA memory. (#14469 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-08 07:12:22 +00:00
Harry Mellor	f7a6bd0fa1	Fix missing `kv_caches` and `attn_metadata` in `OpenVINOCausalLM` (#14271 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-07 12:30:42 +00:00
youkaichao	151b08e0fe	[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-07 00:32:46 +08:00
Siyuan Liu	beebf4742a	[TPU][Profiler] Support start_profile/stop_profile in TPU worker (#13988 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-03-04 14:40:06 -05:00
Zhanwen Chen	66233af7b6	Use math.prod instead of np.prod for trivial ops (#14142 )	2025-03-03 21:09:22 -08:00
Jun Duan	82fbeae92b	[Misc] Accurately capture the time of loading weights (#14063 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-03-01 17:20:30 -08:00
Kacper Pietkun	b91660ddb8	[Hardware][Intel-Gaudi] Regional compilation support (#13213 )	2025-02-28 00:51:49 -08:00
Benjamin Chislett	9804145cac	[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-02-27 15:28:08 -08:00
Yang Zheng	4b1d141f49	[PP] Correct cache size check (#13873 ) Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>	2025-02-27 17:47:29 +08:00
Joe Runde	3f808cc044	[Bugfix] Do not crash V0 engine on input errors (#13101 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-26 19:07:29 +08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
youkaichao	eb24dc4a45	[v1] torchrun compatibility (#13642 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-23 22:47:24 +08:00
youkaichao	3e472d882a	[core] set up data parallel communication (#13591 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 19:28:59 +08:00
Jee Jee Li	105b8ce4c0	[Misc] Reduce LoRA-related static variable (#13166 )	2025-02-22 00:21:30 -08:00
Jun Duan	68d535ef44	[Misc] Capture and log the time of loading weights (#13666 )	2025-02-21 22:06:34 -08:00
ajayvohra2005	6a417b8600	fix neuron performance issue (#13589 )	2025-02-20 10:59:36 -08:00
Lucia Fang	f525c0be8b	[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-19 17:06:23 +08:00
Zhe Zhang	fdc5df6f54	use device param in load_model method (#13037 )	2025-02-19 16:05:02 +08:00
Yu-Zhou	d0a7a2769d	[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch (#12139 ) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-18 19:40:19 -08:00
Aoyu	2092a6fa7d	[V1][Core] Add worker_base for v1 worker (#12816 ) Signed-off-by: Aoyu <aoyuzhan@amazon.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Aoyu <aoyuzhan@amazon.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-02-13 20:35:18 +08:00
Christian Pinto	974dfd4971	[Model] IBM/NASA Prithvi Geospatial model (#12830 )	2025-02-11 20:34:30 -08:00
shangmingc	913df14da3	[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU (#12935 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-08 14:46:19 +00:00
Sanju C Sudhakaran	2880e21e3d	[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi (#12812 ) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>	2025-02-08 17:15:30 +08:00
Roger Wang	bf3b79efb8	[VLM] Qwen2.5-VL	2025-02-05 13:31:38 -08:00
Harry Mellor	fcf2e3d7fc	[Bugfix] Fix OpenVINO model runner (#12750 )	2025-02-04 22:42:46 -08:00
Lucas Wilkinson	75e94309e8	[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676 ) Signed-off-by: simon-mo <xmo@berkeley.edu> Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-02-04 18:22:24 -08:00
Cody Yu	cf58b9c4ca	[MISC] Remove model input dumping when exception (#12582 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-03 13:34:16 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Lucas Wilkinson	baeded2569	[Attention] Deepseek v3 MLA support with FP8 compute (#12601 ) This PR implements the Deepseek V3 support by performing matrix absorption the fp8 weights --------- Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>	2025-01-31 21:52:51 -08:00
fade_away	cb3e73e4c8	[BugFix] fix wrong output when using lora and num_scheduler_steps=8 (#11161 ) FIX issue https://github.com/vllm-project/vllm/issues/9688 https://github.com/vllm-project/vllm/issues/11086 #12487 --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-02-01 12:52:07 +08:00
Lucas Wilkinson	cabaf4eff3	[Attention] MLA decode optimizations (#12528 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-01-30 23:49:37 -08:00
Harry Mellor	823ab79633	Update `pre-commit` hooks (#12475 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-27 17:23:08 -07:00
Bowen Wang	2bc3fbba0c	[FlashInfer] Upgrade to 0.2.0 (#11194 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-01-27 18:19:24 +00:00
youkaichao	6dd94dbe94	[perf] fix perf regression from #12253 (#12380 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-24 11:34:27 +08:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
youkaichao	6e650f56a1	[torch.compile] decouple compile sizes and cudagraph sizes (#12243 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-24 02:01:30 +08:00
Konrad Zawora	96f6a7596f	[Bugfix] Fix HPU multiprocessing executor (#12167 ) Signed-off-by: Konrad Zawora <kzawora@habana.ai>	2025-01-23 02:07:07 +08:00
youkaichao	68ad4e3a8d	[Core] Support fully transparent sleep mode (#11743 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-22 14:39:32 +08:00
youkaichao	66818e5b63	[core] separate builder init and builder prepare for each batch (#12253 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-22 14:13:52 +08:00
youkaichao	c222f47992	[core][bugfix] configure env var during import vllm (#12209 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-20 19:35:59 +08:00
Cyrus Leung	59a0192fb9	[Core] Interface for accessing model from `VllmRunner` (#10353 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-20 15:00:59 +08:00
youkaichao	da02cb4b27	[core] further polish memory profiling (#12126 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-18 12:25:08 +08:00
youkaichao	87a0c076af	[core] allow callable in collective_rpc (#12151 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-17 20:47:01 +08:00

1 2 3 4 5 ...

439 Commits