xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-04 04:57:09 +08:00

Author	SHA1	Message	Date
Alexander Matveev	3aaa94ac99	[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier (#28687 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-19 15:47:13 -08:00
JartX	8e38e99829	[Feature] EPLB on Qwen3VLMoe and CompressedTensorsWNA16MoEMethod (#28849 )	2025-11-19 18:30:08 -05:00
Wentao Ye	0075bfffd4	[CI] Fix precommit `rope_theta` issue (#29040 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 14:22:43 -08:00
Max Hu	cb0a7b4bea	[Bugfix] Move flashinfer kernel check into ```__init__`` `function of` ``FusedMoE``` (#29018 ) Signed-off-by: Max Hu <hyoung2991@gmail.com>	2025-11-19 21:54:15 +00:00
Lucas Wilkinson	8f4f77a727	[BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 (#29036 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-19 13:43:54 -08:00
Micah Williamson	22e44ad589	[ROCm][CI] Fix Weight Loading With Multiple GPU Tests on ROCm (#28984 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-11-19 21:31:33 +00:00
Yongye Zhu	88f5b19f0b	[DeepSeek] Fix DeepSeek V3.2 Rope Embedding (#28968 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-11-19 16:30:04 -05:00
Shu Wang	613abb50d5	[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990 ) Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-19 13:29:06 -08:00
Julien Denize	cdeec2e606	[BugFix] Ray with multiple nodes (#28873 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-11-19 21:20:58 +00:00
Wentao Ye	1607e664f0	[Bug] Fix Batch Invariant MLA test (#28967 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 21:18:32 +00:00
Ryan Rock	68d7231991	[CI/Build] Fix test_prefix_prefill for AMD (#28905 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2025-11-19 16:04:36 -05:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Izzy Putterman	02f5903b84	Eagle: MM Cuda Graphs with MRope (#28896 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-19 15:01:05 -05:00
Aleksandr Malyshev	ac10fd3c69	Upstreaming aiter triton attention backend as a new backend (#28701 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-11-19 19:59:30 +00:00
杰兮	9d2d561257	[Bugfix] Fix precision corruption when shared_experts_stream=None (#28942 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-11-19 19:30:57 +00:00
Robert Shaw	fe69f331f8	[Kernels] Improve H200 Fused MoE Config (#28992 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-19 19:23:54 +00:00
Jialin Ouyang	3319a493fc	[Core] Reuse created spec tokens lists to mitigate GC cost (#28917 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-19 19:20:22 +00:00
Copilot	61728cd1df	Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests (#28966 ) Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-19 13:32:19 -05:00
Yuxuan Zhang	0c80efd94f	GLM-V video segmentation solution adjustment (#28941 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>	2025-11-19 17:32:55 +00:00
Harry Mellor	a8b70304d6	Update `rope_scaling` to `rope_parameters` in preparation for Transformers v5 (#28542 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 09:06:36 -08:00
Shanshan Shen	d44e9df7d4	[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device (#26487 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-11-19 16:24:55 +00:00
Lucas Wilkinson	48fc8b1e59	[BugFix] Fix async-scheduling + FlashAttn MLA (#28990 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-19 10:04:07 -05:00
vnadathur	1ffe934c8a	[torch.compile] caching of config fields should be opt-out by default (#26468 ) Signed-off-by: vnadathur <glvikramn@gmail.com> Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com> Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com> Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-19 06:13:54 -08:00
Yanan Cao	2c8b9182b5	[CI] Reorganize compile tests so new tests are automatically included in CI (#28625 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-19 06:13:50 -08:00
Harry Mellor	4f5299f717	Relax Transformers modeling backend MoE experts check (#28952 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 21:50:30 +08:00
Didier Durand	09540cd918	[Doc]: fix typos in various files (#29010 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-19 04:56:21 -08:00
Chen Bruce	da2f6800e0	[Feat][Perf] Enable deepep-low-latency with round-robin expert placement. (#28449 ) Signed-off-by: bruceszchen <bruceszchen@tencent.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 13:46:24 +01:00
Tova Movshovitz	ba558c029a	[config] Expose `get_total_num_hidden_layers()` in ModelConfig (#28961 ) Signed-off-by: tovam <tovam@pliops.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-19 11:37:11 +00:00
Harry Mellor	97cfa99d59	[Docs] Take env var definition out of folded admonition (#29005 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 03:32:04 -08:00
j20120307	bbc6c2f1e5	[CI/Build] Fix broken build on Apple M1 (#28999 ) Signed-off-by: Kan Zhu <j20120307@gmail.com>	2025-11-19 11:07:22 +00:00
ihb2032	8151609583	refactor(cpu_types_scalar.hpp): Unify scalar loop implementations using unroll_loop (#28847 ) Signed-off-by: ihb2032 <1355790728@qq.com> Co-authored-by: lyd1992 <liuyudong@iscas.ac.cn>	2025-11-19 11:05:44 +00:00
Michael Yao	fdf93486d6	[Docs] Clean up moe_kernel_features.md (#28530 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-11-19 02:35:29 -08:00
gnovack	d69062c67a	add support for --fully-sharded-loras in fused_moe (#28761 ) Signed-off-by: gnovack <gnovack@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-19 16:32:00 +08:00
Louie Tsai	ae4821a108	Add CPU support model (#28697 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-11-18 23:47:57 -08:00
Didier Durand	7ed27f3cb5	[Doc]: fix typos in various files (#28945 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-18 22:52:30 -08:00
Michael Goin	a4511e38db	Speed up macOS smoke test (#28954 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-18 22:46:32 -08:00
Roman Solomatin	71d0ae1c54	[Misc] Update embedding/cross encoder tests to use `mteb` v2 (#27329 ) Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-11-18 22:28:40 -08:00
Lukas Geiger	3d4e7d34be	[Model][QwenVL] Simplify cos/sin rotary embedding indexing (#28962 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-19 05:43:01 +00:00
Uranus	6a25ea5f0e	[Docs] Update oneshot imports (#28188 ) Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>	2025-11-19 05:30:08 +00:00
Gleb Kurchanov	73ff872db0	[Bugfix] Fix typo in Qwen3 Next model executor (#28960 ) Signed-off-by: Gleb Kurchanov <nepherpitou@gmail.com>	2025-11-19 05:21:02 +00:00
Xin Yang	468a8d72ba	[Bugfix] Fix FusedMoEModularKernel for triton backend (#28913 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-11-19 13:05:22 +08:00
Matthew Bonanni	4c23690f43	[Attention] FlashAttention ViT support, make default backend (#28763 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-18 20:06:21 -08:00
Strahinja Stamenkovic	814843e021	Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307 ) Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>	2025-11-19 03:12:31 +00:00
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
Jialin Ouyang	40b6b38f2c	[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-19 02:10:02 +00:00
Jerry Zhang	da94c7c0eb	Move online quantization to `model.load_weights` (#26327 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-11-18 16:52:41 -08:00
tomeras91	1395461f5f	[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-18 16:49:36 -08:00
Varun Sundar Rabindranath	9912b8ccb8	[Build] Add OpenAI triton_kernels (#28788 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-18 16:45:20 -08:00
Johnny	49ef847aa8	[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (#28938 ) Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com>	2025-11-18 16:44:27 -08:00
Michael Goin	67745d189f	Supress verbose logs from model_hosting_container_standards (#28949 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-18 12:29:06 -08:00

1 2 3 4 5 ...

11450 Commits