xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-07 01:15:42 +08:00

Author	SHA1	Message	Date
Driss Guessous	3fd74189db	Fixes bench (#29058 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-11-20 21:21:54 +00:00
rasmith	5e5a7eb16f	[CI/Build] Make test_attention_selector.py run tests on correct platform (#29064 ) Signed-off-by: Randall Smith <ransmith@amd.com> Signed-off-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-20 20:45:56 +00:00
rasmith	3d84ef9054	[CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py (#29043 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 20:39:49 +00:00
Software Developer	4d01b64284	[Bugfix] - Add Trace Headers to Beam Search Path (#29100 ) Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>	2025-11-20 20:00:33 +00:00
Kevin H. Luu	114b0e2500	[chore] Update annotate release scripts (#29077 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-11-20 10:22:40 -08:00
Or Ozeri	647464719b	[KVConnector][Core] Support cross-layer KV blocks (#27743 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-11-20 19:09:59 +01:00
Pan Li	e5bfcb6a88	[BugFix][PD]: make example proxy usable with P2pNcclConnector (#26628 ) Signed-off-by: PAN <1162953505@qq.com>	2025-11-20 17:38:31 +00:00
Alexei-V-Ivanov-AMD	22924383e1	Updating the mirror of test-amd.yaml as of 2025-11-18 (#29016 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-11-20 12:07:06 -05:00
rookie	56f45eddaf	[Frontend] Optimize beam search loop by sorting and then splicing (#19347 ) Signed-off-by: zhangguozhu <zhangguozhu@360.cn> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: zhangguozhu <zhangguozhu@360.cn> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-20 09:02:30 -08:00
TJian	82b05b15e6	[BugFix] [FEAT] Enable fastsafetensors for ROCm platform (#28225 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-20 16:34:11 +00:00
Fanli Lin	a2e9ebe9e2	[BugFix] Fix flash_attn import in `siglip2navit.py` (#29082 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-11-20 12:14:29 +00:00
Zhewen Li	93c8672ceb	[Bugfix] Fix spec decode memory regression after #28549 (#28819 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-20 19:05:50 +08:00
Samit	371b1d4c61	[RL] Add Pause and Resume Generation for Asynchronous RL Training (#28037 ) Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: samithuang <285365963@qq.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-20 03:01:03 -08:00
Shinichi Hemmi	c9e093116c	[MODEL] Implement plamo3 (#28834 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>	2025-11-20 03:00:19 -08:00
Or Ozeri	c0c2dd1e0b	[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 18:55:10 +08:00
Pleaplusone	06c20c9904	[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 02:54:01 -08:00
Anna Shors	6eb745d9bd	Add truncate arg to yarn to match openai implementation of gpt-oss (#28244 ) Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-20 18:53:50 +08:00
cjackal	66483a9d00	[Chore] Update `xgrammar` version from 0.1.25 to 0.1.27 (#28221 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-11-20 02:53:09 -08:00
Jinzhen Lin	edfe867208	[Misc] don't cache `CUTLASS_REVISION` var in CMakeLists.txt (#28518 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-11-20 02:52:53 -08:00
Dezhan	dc45efc8ef	[BugFix] Fix Llama4 Pipeline Parallelism Assert Error (#28577 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2025-11-20 02:52:36 -08:00
Vensen	fb8851f254	[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu (#28760 ) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensenmu <vensenmu@gmail.com>	2025-11-20 02:52:02 -08:00
Boyuan Feng	a903d59ffa	cleanup at::Tag::needs_fixed_stride_order (#28974 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 02:51:36 -08:00
rasmith	322cb02872	[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 17:48:09 +08:00
Wentao Ye	2c52c7fd9a	[Bug] Fix torch dynamo warning Dynamo detected a call to a `functools.lru_cache` (#29038 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 16:52:23 +08:00
Bradley D	1e1c06789e	[ci][amd] fix EPLB execution test (#28742 ) Signed-off-by: Bradley Davis <bradleyhd@meta.com>	2025-11-20 14:53:38 +07:00
Pleaplusone	7218f83992	[ROCm][BugFix] Fix shared expert loading error when disable `VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS` (#28633 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 14:50:23 +07:00
Cyrus Leung	20e4497be2	[V0 Deprecation] Remove `num_lookahead_slots` (#29000 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-20 06:39:10 +00:00
Quentin Gallouédec	1c7bcc55b8	[Frontend] Allow parsed tool arguments (#28820 ) Signed-off-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-19 22:20:12 -08:00
Lukas Geiger	a9705a290a	[Model][QwenVL] Replace `torch.repeat_interleave` with faster `np.repeat` (#28964 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-19 22:04:23 -08:00
Isotr0py	64192d5624	[Bugfix] Revert custom attention mask for gemma3-mm (#28995 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 13:23:22 +08:00
Canlin Guo	fe25772aa9	[Bugfix] Handle broken frames in video loading (#29001 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: 凌葭 <lvjiang.lj@alibaba-inc.com> Co-authored-by: 凌葭 <lvjiang.lj@alibaba-inc.com>	2025-11-20 04:38:12 +00:00
prashanth058	0cca9b4d13	[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM (#28972 ) Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>	2025-11-20 03:50:37 +00:00
Shengliang Xu	a8c536829c	Consolidate Nvidia ModelOpt quant config handling for all quantization methods (#28076 ) Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>	2025-11-19 22:39:36 -05:00
Benjamin Chislett	fcbcba6c70	[Feat] Iteration-level profiling for Torch and CUDA profiler (#28987 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-19 19:17:48 -08:00
Fadi Arafeh	3168285fca	[cpu][ci] Add initial set of tests for Arm CPUs (#28657 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-11-20 02:37:09 +00:00
Qiang Zhang	3fb0d90999	[AMD] Use Decoupled Kernel Block Size to Support AITER MLA block_size=1 (#27715 ) Signed-off-by: chiangzhang <chiangzhang@tencent.com>	2025-11-20 02:11:52 +00:00
Kuntai Du	05c2dee7e9	[DeepSeek + LMCache Multiprocess] handle MLA for deepseek model + LMCache Multiprocess connector (#29039 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-11-20 01:40:49 +00:00
liangel-02	1d642872a2	[torchao] fix safetensors for sharding (#28169 ) Signed-off-by: Angel Li <liangel@meta.com>	2025-11-19 16:39:45 -08:00
Nick Hill	9ccef8e333	[Misc] Colorize logs (#29017 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-19 19:26:04 -05:00
Jialin Ouyang	537cc635c7	[GC Debugger] Simply and improve GC Debugger Utils (#29029 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 00:10:22 +00:00
Wentao Ye	5031cd5d55	[Refactor] Optimize `select_experts` (#28069 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 18:53:15 -05:00
Alexander Matveev	3aaa94ac99	[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier (#28687 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-19 15:47:13 -08:00
JartX	8e38e99829	[Feature] EPLB on Qwen3VLMoe and CompressedTensorsWNA16MoEMethod (#28849 )	2025-11-19 18:30:08 -05:00
Wentao Ye	0075bfffd4	[CI] Fix precommit `rope_theta` issue (#29040 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 14:22:43 -08:00
Max Hu	cb0a7b4bea	[Bugfix] Move flashinfer kernel check into ```__init__`` `function of` ``FusedMoE``` (#29018 ) Signed-off-by: Max Hu <hyoung2991@gmail.com>	2025-11-19 21:54:15 +00:00
Lucas Wilkinson	8f4f77a727	[BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 (#29036 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-19 13:43:54 -08:00
Micah Williamson	22e44ad589	[ROCm][CI] Fix Weight Loading With Multiple GPU Tests on ROCm (#28984 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-11-19 21:31:33 +00:00
Yongye Zhu	88f5b19f0b	[DeepSeek] Fix DeepSeek V3.2 Rope Embedding (#28968 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-11-19 16:30:04 -05:00
Shu Wang	613abb50d5	[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990 ) Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-19 13:29:06 -08:00
Julien Denize	cdeec2e606	[BugFix] Ray with multiple nodes (#28873 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-11-19 21:20:58 +00:00

1 2 3 4 5 ...

11541 Commits