xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-19 20:57:21 +08:00

Author	SHA1	Message	Date
vnadathur	1ffe934c8a	[torch.compile] caching of config fields should be opt-out by default (#26468 ) Signed-off-by: vnadathur <glvikramn@gmail.com> Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com> Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com> Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-19 06:13:54 -08:00
Didier Durand	7ed27f3cb5	[Doc]: fix typos in various files (#28945 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-18 22:52:30 -08:00
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
Jialin Ouyang	40b6b38f2c	[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-19 02:10:02 +00:00
Ning Xie	ac1daf3233	fix comment typo (#28802 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-16 17:03:21 +00:00
Laith Sakka	2e0ad629b0	Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-14 14:11:10 -08:00
Alexander Matveev	69d0e90313	[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 23:37:24 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4ca5cd5740	[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695 ) Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-11-12 15:24:12 -08:00
QiliangCui	3eb0c2673e	[TPU] Support GCS path in VLLM_TORCH_PROFILER_DIR (#28487 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-11-12 22:31:14 +00:00
Andreas Karatzas	9f0247cfa4	`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>	2025-11-11 18:34:36 -08:00
wangxiyuan	e1710393c4	[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 18:22:16 -07:00
Ilya Markov	1788aa1efb	[BugFix] Graceful handling of torch symm mem errors. (#27671 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-11 17:41:54 -07:00
Max Hu	412e153df5	[Feature] Allow configuring FlashInfer workspace size (#28269 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-11 23:32:20 +00:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Zhuohan Li	8d706cca90	[Misc] FlattenLogprobs -> FlatLogprobs (#28335 )	2025-11-11 03:41:23 +00:00
Wentao Ye	de540c0354	[Feature] Add env var `VLLM_MOE_USE_DEEP_GEMM` (#28422 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-11 02:29:48 +00:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Yong Hoon Shin	de2b78305f	[ROCm] Add env to enable/disable aiter triton gemm (#28321 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-08 22:27:00 -08:00
Benjamin Chislett	975676d174	[Feat] Drop-in Torch CUDA Profiler (#27841 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-08 14:07:37 -08:00
Pavani Majety	72b1c2ae2c	[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-11-07 04:18:39 -08:00
Jialin Ouyang	ccd98b59c1	[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead (#28171 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-07 00:27:12 -08:00
Wentao Ye	90189c71a9	[Bug] Fix env string `"0"` same to `True` (#28159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:04:20 -08:00
Chenheli Hua	1fb4217a05	[Multimodal] Make MediaConnector extensible. (#27759 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-11-04 18:28:01 +00:00
ahao-anyscale	cac4c10ef0	[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-11-03 11:13:51 -05:00
Paul Zhang	e7acb20076	[Feature] Batch invariant torch.compile (#27660 ) Signed-off-by: PaulZhang12 <paulzhan@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-30 13:11:29 -07:00
Wentao Ye	a8141fa649	[Refactor] Remove `VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK` (#27750 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-30 15:32:39 -04:00
Boyuan Feng	a9fe0793f2	`use_aot_compile` should respect `VLLM_DISABLE_COMPILE_CACHE` (#27698 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-10-29 17:08:54 +00:00
Alec S	ab2eb27b74	[Frontend] [gpt-oss] Mcp type bug (#27689 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-29 10:01:32 +00:00
Alec S	3c7fefdeba	[Frontend] [gpt-oss] Tool json call parsing error retry (#27675 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-29 09:42:44 +00:00
Wentao Ye	d3ab240f39	[Bug] Fix deepep low latency use nvlink by default (#27677 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-28 23:53:12 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	936643a868	[BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache (#27294 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-28 10:22:28 -04:00
Pengchao Wang	d95d0f4b98	[Distributed] Basic set of configuration for large EP deployment on GB200 (#27328 ) Signed-off-by: Pengchao Wang <wpc@fb.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2025-10-24 14:16:44 -07:00
Ming Yang	0f67d4d962	[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek (#26397 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-10-24 10:24:08 -07:00
Richard Zou	cd390b609d	[compile] Turn standalone_compile back on (#27460 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-10-24 16:30:27 +00:00
Alexander Matveev	344a0017c0	[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE (#26440 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-10-21 21:38:29 +00:00
Nick Hill	647214f3d5	[V0 Deprecation] Remove V0 executors (#27142 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-21 11:09:37 -07:00
Woosuk Kwon	fb860670da	[Minor] Remove unused env variable (#27161 )	2025-10-18 18:48:35 -07:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Patrick von Platen	b038d9c40c	[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) (#26367 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-17 08:24:42 -07:00
kliuae	1317034379	[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops (#24097 ) Signed-off-by: chenjun <junchen2@amd.com> Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com> Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-10-16 10:41:34 +08:00
Wentao Ye	e5b438a247	[Bug] Temporally Disable `VLLM_ALLREDUCE_USE_SYMM_MEM` by Default (#26925 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-15 16:18:50 -04:00
Kaixi Hou	de92d916fe	[NVIDIA] Add support for cudnn fp4 gemm via flashinfer (#26107 ) Signed-off-by: kaixih <kaixih@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-15 13:53:00 -04:00
Jialin Ouyang	380f17527c	[Perf] Cache vllm.env.__getattr__ result to avoid recomputation (#26146 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-14 17:03:21 -04:00
Wentao Ye	6d87a2838c	[Config] Remove Unused Environment Variable `VLLM_DISABLE_PAD_FOR_CUDAGRAPH` (#26743 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-14 11:47:49 -04:00
Jialin Ouyang	cfded80793	[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-14 01:46:44 +00:00
Michael Goin	0d21b9b51e	[UX] Speedup DeepGEMM warmup with heuristics (#25619 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-13 07:59:27 -07:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Zhengxu Chen	eef921f45e	AOT Compilation for torch.compile (Bundled) (#24274 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-10-10 19:02:11 -04:00
Rui Qiao	757fa4a4da	[DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY (#23849 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-09 19:53:43 -07:00
Wentao Ye	f8607863d8	[Feature] Enable E8M0 by Default on Hopper for DeepGEMM, 5% E2E throughput improvement (#26197 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-08 15:33:56 +08:00

1 2 3 4 5 ...

328 Commits