xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-22 12:36:46 +08:00

Author	SHA1	Message	Date
Harry Mellor	a742134cc5	Remove deprecated fields from `CompilationConfig` (#27593 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 16:10:28 +00:00
TJian	edb59a9470	[ROCm] [Bugfix] Fix `fused_qknorm_rope_kernel` rocm compatibility (#28500 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-12 05:01:14 -08:00
Yanan Cao	48c879369f	[Frontend] Change CompilationMode to a proper Enum (#28165 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-11 19:46:18 -05:00
zhrrr	68c09efc37	[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-11-11 12:00:31 -05:00
Ilya Markov	d17ecc6b19	[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-10 18:33:11 -05:00
Harry Mellor	c0a4b95d64	Fix issues from #28242 (#28257 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-07 04:23:17 +00:00
Lucas Kabela	4bf56c79cc	[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-11-07 00:16:03 +00:00
Vadim Gimpelson	b6a248bdd7	[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-05 17:01:12 -08:00
Vadim Gimpelson	d4e547bb7e	Revert "[PERF] Decouple projections from GDN custom op" (#28080 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-04 15:58:23 -08:00
Vadim Gimpelson	5fd8f02ea9	[PERF] Decouple projections from GDN custom op (#27512 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-04 08:11:41 -08:00
ahao-anyscale	cac4c10ef0	[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-11-03 11:13:51 -05:00
Zhiyuan Li	4e68cc9b6a	[Model] Introduce Kimi Linear to vLLM (#27809 ) Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn> Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>	2025-10-30 21:02:27 +08:00
Lucas Kabela	94666612a9	[Misc][qwen2_5_vl][torch.compile] Enable `supports_torch_compile` on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>	2025-10-28 22:36:43 +00:00
fhl2000	284cc92275	[MISC] `cudagraph_capture_sizes` related improvements (#26016 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-24 05:11:05 -07:00
Andy Lo	b63f2143f8	[LoRA] LoRA cuda graph specialization (#25914 ) Signed-off-by: Andy Lo <andy@mistral.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-20 04:21:09 +00:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Cyrus Leung	4d4d6bad19	[Chore] Separate out `vllm.utils.importlib` (#27022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 00:48:59 +00:00
Morrison Turnansky	96b9aa5aa0	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): name change compilation level to compilation mode, deprecation compilation level (#26355 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 02:51:16 +00:00
Luka Govedič	2dcd12d357	[torch.compile] Fix tests for torch==2.9 inductor partition (#26116 ) Signed-off-by: ProExpertProg <lgovedic@redhat.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-10-14 19:55:02 -04:00
Boyuan Feng	ca683a2a72	use combo kernel to fuse qk-norm and qk-rope (#26682 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-10-14 09:40:59 -04:00
Morrison Turnansky	e3fdb627d9	[FrontEnd] UNREVERT CompilationConfig overhaul (#20283 ): deprecate use_inductor in favor of backend, simplify custom_ops (#26502 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>	2025-10-13 22:47:16 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Angela Yi	01653a917b	[compile] Fix inductor partition config (#26645 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-10-11 21:03:14 +00:00
baonudesifeizhai	cddce79fda	[torch.compile] Make inductor partition rules respect splitting_ops #25691 (#25845 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-10 16:35:28 +00:00
Lucas Wilkinson	29255cfc3b	[Spec-Decode] Support piecewise cudagraphs for Eagle head (#25109 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-10-10 01:20:31 -04:00
Jiangyun Zhu	5728da11ea	Revert #26113 "[Frontend] CompilationConfig overhaul (#20283 ): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-09 05:43:55 -07:00
Naveenraj Kamalakannan	e614ab7806	Separate MLAAttention class from Attention (#25103 ) Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-08 17:11:11 -07:00
Morrison Turnansky	0c824fc46f	[Frontend] CompilationConfig overhaul (#20283 ): deprecate use_inductor in favor of backend, simplify custom_ops (#26113 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>	2025-10-07 12:53:43 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Jiangyun Zhu	43227236ec	[torch.compile] serialize cudagraph_mode as its enum name instead of value (#25868 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-29 13:54:52 +00:00
Jiangyun Zhu	c0ec81836f	[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable (#25651 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-27 16:09:00 +00:00
fhl2000	f075693da7	[V1] address post issues related to #20059 (part 1) (#23046 ) Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-26 15:58:19 -04:00
Isotr0py	71b25b0d48	[V0 deprecation] Clean up V0 fallback in compilation config (#25675 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-25 17:29:51 +00:00
Michael Goin	24fab45d96	[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE (#25444 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 15:29:26 -04:00
Luka Govedič	d5e0fca264	[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091 ), fix test (#24376 ), and prep for custom op matching (#24604 ) (#24542 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: luka <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-22 12:30:05 -07:00
Boyuan Feng	8945b001db	[torch.compile] CUDAGraph Inductor partition integration (#24281 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Signed-off-by: boyuanfeng <boyuan@meta.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-20 01:02:15 +00:00
Wentao Ye	d2a30a2d93	[Bug] Fix torch Compilation Cache Hit Error (#25093 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-18 12:38:37 -07:00
Tao He	e93f4cc9e3	Add the support for the qwen3 next model (a hybrid attention model). (#24526 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-11 15:32:09 +08:00
Wentao Ye	a55cf41a09	[Compilation][WideEP] Enable Piecewise CUDAGraph for DeepEPHT (#24123 )	2025-09-09 10:21:10 -04:00
nopperl	fa4311d85f	[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998 ) Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp> Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com> Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>	2025-09-03 08:24:02 -07:00
co63oc	1bd007f234	fix some typos (#24071 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-09-02 20:44:50 -07:00
Thomas Parnell	dd58932280	[V1] [Hybrid] Enable compile and piecewise CUDA graph for MiniMax-Text models (#22589 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-27 10:05:16 -07:00
Harry Mellor	b00e69f8ca	Fix nits from #20059 (#23548 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 03:27:20 -07:00
Paul Pak	2e2000f352	[Model] Add LFM2 architecture (#22845 ) Signed-off-by: Paul Pak <paulpak58@gmail.com>	2025-08-21 09:35:07 +02:00
Asaf Joseph Gardin	3663870c72	[V1][Mamba1] - Full CUDA and Piecewise CUDA Graphs Support (#23035 ) Signed-off-by: asafg <asafg@ai21.com> Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com> Co-authored-by: asafg <asafg@ai21.com>	2025-08-20 20:08:51 -07:00
fhl2000	74f441f4b5	[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-08-15 10:01:39 -04:00
Harry Mellor	2d18256e47	Move `ParallelConfig` from `config/__init__.py` to `config/parallel.py` (#22565 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-09 08:33:46 -07:00
Harry Mellor	e3edc0a7a8	Extract `CompilationConfig` from `config.py` (#22524 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-08 16:34:25 -07:00

49 Commits