xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-15 03:24:29 +08:00

Author	SHA1	Message	Date
Fanli Lin	a2e9ebe9e2	[BugFix] Fix flash_attn import in `siglip2navit.py` (#29082 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-11-20 12:14:29 +00:00
Zhewen Li	93c8672ceb	[Bugfix] Fix spec decode memory regression after #28549 (#28819 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-20 19:05:50 +08:00
Shinichi Hemmi	c9e093116c	[MODEL] Implement plamo3 (#28834 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>	2025-11-20 03:00:19 -08:00
Pleaplusone	06c20c9904	[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 02:54:01 -08:00
Anna Shors	6eb745d9bd	Add truncate arg to yarn to match openai implementation of gpt-oss (#28244 ) Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-20 18:53:50 +08:00
Dezhan	dc45efc8ef	[BugFix] Fix Llama4 Pipeline Parallelism Assert Error (#28577 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2025-11-20 02:52:36 -08:00
Wentao Ye	2c52c7fd9a	[Bug] Fix torch dynamo warning Dynamo detected a call to a `functools.lru_cache` (#29038 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 16:52:23 +08:00
Pleaplusone	7218f83992	[ROCm][BugFix] Fix shared expert loading error when disable `VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS` (#28633 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 14:50:23 +07:00
Lukas Geiger	a9705a290a	[Model][QwenVL] Replace `torch.repeat_interleave` with faster `np.repeat` (#28964 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-19 22:04:23 -08:00
Isotr0py	64192d5624	[Bugfix] Revert custom attention mask for gemma3-mm (#28995 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 13:23:22 +08:00
Shengliang Xu	a8c536829c	Consolidate Nvidia ModelOpt quant config handling for all quantization methods (#28076 ) Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>	2025-11-19 22:39:36 -05:00
liangel-02	1d642872a2	[torchao] fix safetensors for sharding (#28169 ) Signed-off-by: Angel Li <liangel@meta.com>	2025-11-19 16:39:45 -08:00
Wentao Ye	5031cd5d55	[Refactor] Optimize `select_experts` (#28069 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 18:53:15 -05:00
JartX	8e38e99829	[Feature] EPLB on Qwen3VLMoe and CompressedTensorsWNA16MoEMethod (#28849 )	2025-11-19 18:30:08 -05:00
Wentao Ye	0075bfffd4	[CI] Fix precommit `rope_theta` issue (#29040 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 14:22:43 -08:00
Max Hu	cb0a7b4bea	[Bugfix] Move flashinfer kernel check into ```__init__`` `function of` ``FusedMoE``` (#29018 ) Signed-off-by: Max Hu <hyoung2991@gmail.com>	2025-11-19 21:54:15 +00:00
Yongye Zhu	88f5b19f0b	[DeepSeek] Fix DeepSeek V3.2 Rope Embedding (#28968 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-11-19 16:30:04 -05:00
Shu Wang	613abb50d5	[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990 ) Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-19 13:29:06 -08:00
Wentao Ye	1607e664f0	[Bug] Fix Batch Invariant MLA test (#28967 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 21:18:32 +00:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Izzy Putterman	02f5903b84	Eagle: MM Cuda Graphs with MRope (#28896 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-19 15:01:05 -05:00
杰兮	9d2d561257	[Bugfix] Fix precision corruption when shared_experts_stream=None (#28942 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-11-19 19:30:57 +00:00
Robert Shaw	fe69f331f8	[Kernels] Improve H200 Fused MoE Config (#28992 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-19 19:23:54 +00:00
Yuxuan Zhang	0c80efd94f	GLM-V video segmentation solution adjustment (#28941 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>	2025-11-19 17:32:55 +00:00
Harry Mellor	a8b70304d6	Update `rope_scaling` to `rope_parameters` in preparation for Transformers v5 (#28542 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 09:06:36 -08:00
Shanshan Shen	d44e9df7d4	[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device (#26487 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-11-19 16:24:55 +00:00
Harry Mellor	4f5299f717	Relax Transformers modeling backend MoE experts check (#28952 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 21:50:30 +08:00
Chen Bruce	da2f6800e0	[Feat][Perf] Enable deepep-low-latency with round-robin expert placement. (#28449 ) Signed-off-by: bruceszchen <bruceszchen@tencent.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 13:46:24 +01:00
Lukas Geiger	3d4e7d34be	[Model][QwenVL] Simplify cos/sin rotary embedding indexing (#28962 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-19 05:43:01 +00:00
Gleb Kurchanov	73ff872db0	[Bugfix] Fix typo in Qwen3 Next model executor (#28960 ) Signed-off-by: Gleb Kurchanov <nepherpitou@gmail.com>	2025-11-19 05:21:02 +00:00
Xin Yang	468a8d72ba	[Bugfix] Fix FusedMoEModularKernel for triton backend (#28913 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-11-19 13:05:22 +08:00
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
Jerry Zhang	da94c7c0eb	Move online quantization to `model.load_weights` (#26327 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-11-18 16:52:41 -08:00
tomeras91	1395461f5f	[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-18 16:49:36 -08:00
Varun Sundar Rabindranath	9912b8ccb8	[Build] Add OpenAI triton_kernels (#28788 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-18 16:45:20 -08:00
Isotr0py	e4bb2684bc	[Models] Replace all `nn.Conv2d` with vLLM's Conv2dLayer (#28842 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 18:56:04 +00:00
Luciano Martins	c2612371ad	[Model] Add Gemma3 GGUF multimodal support (#27772 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 08:56:29 -08:00
Canlin Guo	b9489f51e1	[Model][Perf] Use cos and sin cache in QwenVL (#28798 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-18 11:51:54 +00:00
Ning Xie	0168f69e50	[Misc] Remove unnecessary parentheses from log statements (#28897 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-17 20:33:46 -08:00
Wentao Ye	3ddcf46011	[Refactor] Remove Unused Func in Batch Invariant (#28881 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-17 20:29:29 -08:00
xuebwang-amd	d0a73620cc	[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss (#28638 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 11:16:45 +08:00
Pranav	f77bce001a	[Model] Add Afmoe architecture implementation (#28332 ) Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> Signed-off-by: Pranav <veldurthipranav@gmail.com> Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>	2025-11-17 15:11:20 -08:00
Shreyas Kulkarni	95ae50b7d1	[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435 ) Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>	2025-11-17 15:01:34 -08:00
Zhewen Li	f8b19c0ffd	[Bugfix] Fix GPT-OSS on AMD after #28603 (#28816 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-17 13:15:26 -05:00
wuyaoxuehun	ab01cd14e5	[BugFix] Fix glm4_moe_mtp load weights bug (#28805 ) Signed-off-by: wuyaoxuehun <798143193@qq.com>	2025-11-17 17:13:11 +08:00
jiahanc	561253b37f	[Performance][Fix] update nvfp4 code to support renorm routing (#28569 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-16 18:02:42 -08:00
amirkl94	03ee48111d	Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261 )	2025-11-16 13:39:44 -05:00
Lukas Geiger	5a87076d6e	[Model][QwenVL] Optimize `Qwen2_5_VisionAttention` q,k preparation (#28769 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-16 17:37:15 +00:00
Anna Shors	8d259fad6c	Fix gpt oss weight loading with EP + bf16 (#28765 ) Signed-off-by: ashors1 <ashors@nvidia.com>	2025-11-16 13:12:45 +00:00
Dezhan	af02c40970	Fixed gpt-oss _load_weights_other() parameter position bug (#28715 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2025-11-16 09:46:29 +00:00

1 2 3 4 5 ...

3325 Commits