xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-18 02:16:59 +08:00

Author	SHA1	Message	Date
ElizaWszola	af0444bf40	[Performance] Fused blockwise quant RMS norm (#27883 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 16:38:04 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Wentao Ye	17eb25e327	[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 04:44:50 +00:00
Ilya Markov	4e26d3b09e	[Compile] Conditional compilation. Introduce compile_ranges (#24252 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-12-05 18:17:32 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Laith Sakka	1f0d184590	[aot_compile]change VLLM backend to read fake args from example_value (#29104 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-04 17:33:45 -05:00
Arpit Khandelwal	dfdda96747	[Core] Remove forced None assignment for deprecated PassConfig flags (#29994 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-04 09:15:04 +00:00
Arpit Khandelwal	d7284a2604	[Core] Rename PassConfig flags as per RFC #27995 (#29646 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-03 03:38:55 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
Yanan Cao	3461e7efd8	[Frontend] Remap -O to -cc commandline flag (#29557 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-11-28 21:51:12 +00:00
Morrison Turnansky	0838b52e2e	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): Set up -O infrastructure (#26847 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: adabeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-27 01:55:58 -08:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
Huamin Li	70d5953f82	Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841 )" (#29483 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-26 22:27:26 +08:00
Harry Mellor	51fc9e017a	Scheduled removal of `CompilationConfig.use_inductor` (#29323 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 12:55:42 +00:00
elvischenv	6330f9477d	[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-11-25 07:59:40 +00:00
Laith Sakka	7a228b5305	Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-24 10:12:41 -05:00
Copilot	61728cd1df	Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests (#28966 ) Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-19 13:32:19 -05:00
Harry Mellor	a8b70304d6	Update `rope_scaling` to `rope_parameters` in preparation for Transformers v5 (#28542 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 09:06:36 -08:00
Yanan Cao	2c8b9182b5	[CI] Reorganize compile tests so new tests are automatically included in CI (#28625 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-19 06:13:50 -08:00
Angela Yi	f36292dbee	[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-11-15 11:46:12 +00:00
Laith Sakka	2e0ad629b0	Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-14 14:11:10 -08:00
Boyuan Feng	fd75d3e8c0	[Minor] avoid register new custom and just import silly_attn (#28578 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-14 09:32:31 +00:00
Yanan Cao	262d263f6c	[Bugfix] Eliminate tuple inputs to submodules in graph partitioning (#28533 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-13 15:09:05 -05:00
Roger Wang	d3387750f1	[Misc] Turn off encoder torch compile by default (#28634 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-13 08:38:08 -08:00
Harry Mellor	a742134cc5	Remove deprecated fields from `CompilationConfig` (#27593 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 16:10:28 +00:00
TJian	edb59a9470	[ROCm] [Bugfix] Fix `fused_qknorm_rope_kernel` rocm compatibility (#28500 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-12 05:01:14 -08:00
Yanan Cao	48c879369f	[Frontend] Change CompilationMode to a proper Enum (#28165 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-11 19:46:18 -05:00
Adrian Abeyta	d23539549a	Use FLASHINFER MLA backend when testing fp8_kv_scale_compile (#28491 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-11-12 00:34:58 +00:00
zhrrr	68c09efc37	[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-11-11 12:00:31 -05:00
jvlunteren	533b018f72	[BugFix] Fix Failing Ruff Check (#28469 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>	2025-11-11 06:41:43 -08:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Adrian Abeyta	a5a790eea6	[Bugfix] Ensure calculated KV scales are applied in attention. (#27232 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-11-10 23:42:37 +00:00
Ilya Markov	d17ecc6b19	[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-10 18:33:11 -05:00
Boyuan Feng	b158df2813	remove resolve_op_overloads and use splitting_ops directly (#28081 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-08 01:13:13 +00:00
Copilot	a736e5ff77	[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly (#28074 )	2025-11-07 15:58:16 +08:00
Lucas Kabela	4bf56c79cc	[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-11-07 00:16:03 +00:00
gmagogsfm	bde5039325	[CI] Add compile/test_multimodal_compile.py to CI (#28151 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-06 05:41:47 +00:00
Lucas Kabela	94666612a9	[Misc][qwen2_5_vl][torch.compile] Enable `supports_torch_compile` on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>	2025-10-28 22:36:43 +00:00
fhl2000	284cc92275	[MISC] `cudagraph_capture_sizes` related improvements (#26016 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-24 05:11:05 -07:00
fhl2000	85fee74b33	[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder (#27427 ) Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>	2025-10-23 20:31:14 -07:00
dongbo910220	a0003b56b0	[Chore] Separate out system utilities from vllm.utils (#27201 ) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-22 20:25:25 +00:00
Harry Mellor	8f18feb191	Remove last `level` references not removed in #26355 (#27260 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-22 09:18:17 +00:00
Jiangyun Zhu	ab3e80042e	[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled (#27146 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-22 00:22:39 -04:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Luka Govedič	bd7157a071	[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-17 08:10:23 -06:00
Boyuan Feng	17c540a993	[torch.compile] fix simple inductor graph partition test (#27050 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-10-16 21:09:36 -04:00
Lucia Fang	11ae016bd7	[torch.compile] Passing only necessary compilation config to inductor pass config (#27041 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>	2025-10-17 00:01:52 +00:00
Richard Zou	9b6504c307	[BugFix] Work around graph partition x torch.compile cache issue (#26956 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-10-15 20:06:11 -07:00

1 2 3 4

195 Commits