xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-09 09:15:43 +08:00

Author	SHA1	Message	Date
Anexdeus	d525556a25	Revert the mixin changes	2025-12-20 13:31:53 +03:00
Anexdeus	b03d1a04a8	added ProcessingInfoMixin for QwenVL series models	2025-12-20 12:29:46 +03:00
Anexdeus	36121c6db0	fixed property bug in processor and added abstract methods in BaseProcessingInfo	2025-12-17 01:31:34 +03:00
Jee Jee Li	94dce5c3d9	Merge branch 'main' into mlm-full-lora-support	2025-12-17 00:33:42 +08:00
Harry Mellor	0b0acc758e	Remove `head_mask` from Ultravox and Swin (#30764 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-16 08:02:41 -08:00
Ming Yang	ce12b407f2	[TRTLLM] Remove the MoE GEMM weight name change (#30713 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-16 11:01:38 -05:00
Wentao Ye	59bd5f6a71	[Feat] Enable eplb with default all2all backend (#30559 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-16 10:33:52 -05:00
Harry Mellor	6f15ac5de7	Don'e assume `position_embedding_type` will be present for BERT and RoBERTa models (#30770 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-16 13:40:26 +00:00
B-201	bdac2b5d17	Merge branch 'main' into mlm-full-lora-support	2025-12-16 19:13:22 +08:00
Isotr0py	e94384bbad	[Bugfix] Fix broken ViT attention selection for Blackwell device (#30731 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-16 05:24:32 +00:00
Shanshan Shen	3bd9c49158	[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic (#29873 ) Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-15 19:08:16 -08:00
Matthew Bonanni	60dbf7d8f1	Update batch invariant to use attention config (#30704 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-15 15:24:16 -05:00
Robert Shaw	d0502b4928	[MoE][Refactor 1/N] Separate Online Quantization (#30627 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-12-15 06:54:53 -08:00
Max Hu	3f175f18a2	[Bugfix] Fix multimodal configuration for Qwen3VL MOE model (#30670 ) Signed-off-by: Max Hu <hyoung2991@gmail.com>	2025-12-15 14:06:01 +00:00
duke	e4806d973a	[BugFix] Add embed_input_ids method to make QWenLMHeadModel a vllm model (#30674 ) Signed-off-by: root <iwzbi@zju.edu.cn> Co-authored-by: root <iwzbi@zju.edu.cn>	2025-12-15 10:38:29 +00:00
wang.yuqi	4429d934de	[Model] Automatic conversion of TokenClassification model (#30666 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-15 08:13:00 +00:00
汪志鹏	1adeb3b84c	[New Model] BAGEL support (AR only) (#28439 ) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-15 14:58:23 +08:00
Wentao Ye	3778673ea8	[Feat] Refactor for `parallel_config` in `FusedMoEModularKernel` (#30282 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-12-15 04:21:36 +00:00
Shanshan Shen	87b4d1557d	[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125 ) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-12-15 11:13:32 +08:00
Shanshan Shen	738648fb81	[CustomOp] Support object-level enable for CustomOp (#30547 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-12-15 11:02:09 +08:00
ZiTian Zhao	ae88aada38	[Feature]Add EVS (Efficient Video Sampling) Support for Qwen3-VL (#29752 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com> Co-authored-by: deitxfge <huhaibo1990@126.com>	2025-12-14 05:24:56 -08:00
zifeitong	48b8456ff9	[Bugfix] Revert Qwen2-VL part of change in #28271 (#30542 ) Signed-off-by: Zifei Tong <zifeitong@gmail.com>	2025-12-14 05:20:08 -08:00
tjp_zju	6ecc1e411b	[Bugfix] fix _get_quant_method of FusedMoE for deepseekV3.2 on non-NV… (#30057 ) Signed-off-by: tjp_zju <tanjianpingzju1990@gmail.com>	2025-12-14 02:20:51 -08:00
Shengliang Xu	0bb0bae436	Nvidia ModelOpt workaround for issue 28072 (#30164 ) Signed-off-by: Shengliang Xu <shengliangx@nvidia.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2025-12-14 18:18:31 +08:00
Ilya Markov	3224ea9915	[torch.compile] Add encoder tag for compilation (#30489 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2025-12-14 18:15:11 +08:00
Lasha Koroshinadze	3a20450d31	Add AudioFlamingo3 model support (#30539 ) Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com> Signed-off-by: Lasha Koroshinadze <26011196+lashahub@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-14 02:14:55 -08:00
Didier Durand	1a55cfafcb	[Doc]: fixing typos in various files (#30540 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-12-14 02:14:37 -08:00
Wentao Ye	6e78ed6ba7	[Logs] Optimize startup logs 4 (#29903 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-13 16:12:53 -05:00
Chen Zhang	ace34e3783	[Bugfix] Qwen3-next with --hf-overrides \{\"num_hidden_layers\":8\} (#30433 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-12-13 22:12:45 +08:00
Cyrus Leung	64251f48df	[Chore] Adjust tokenizer import to avoid circular imports (#30601 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-13 04:42:39 -08:00
Tsukasa OI	fdc135d768	[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization (#30310 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2025-12-13 13:55:14 +08:00
Roberto L. Castro	4fa7ce46f3	[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 ) Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-12 19:34:23 -08:00
rasmith	08f8a5627e	[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality (#30292 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-12 18:41:56 -05:00
danielafrimi	13618626df	[MoE-FP8-modelopt] Add FlashInfer alignment padding for intermediate dimensions (#29748 ) Signed-off-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster> Signed-off-by: dafrimi <dafrimi@nvidia.com> Co-authored-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-12 20:42:32 +00:00
Xin Yang	1f19d8f899	[Perf] Set split_k to 1 for triton_kernels (#30528 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-12-12 14:07:57 -05:00
shivampr	cd7740ac5c	[ROCm] Enable Triton ScaledMM fallback + kernel selection fix (#26668 ) Signed-off-by: Shivam <shivampr.dev@gmail.com> Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-12 13:28:20 -05:00
Christina Norman	dc13c99eed	fix(gguf): Disable bfloat16 for GGUF on blackwell device (#30408 ) Signed-off-by: Christina <truffle@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Christina Norman <christina@example.com> Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-12 23:10:12 +08:00
Lucas Wilkinson	3e41992fec	[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-12 05:57:47 -08:00
Jaehwang Jung	f90319d5d1	[Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692 )	2025-12-12 02:27:20 -08:00
Jee Jee Li	421707dec1	Merge branch 'main' into mlm-full-lora-support	2025-12-12 15:00:59 +08:00
prashanth058	5e78570cce	update packed modules mapping (#11 ) Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>	2025-12-12 13:55:32 +08:00
Michael Goin	9f2fc16a69	[Bugfix][Model] Fix Afmoe rope_parameters issue (#30505 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-12 02:53:57 +00:00
Bhanu Prakash Voutharoja	6a6fc41c79	gptq marlin quantization support for fused moe with lora (#30254 ) Signed-off-by: Bhanu068 <voutharoja.bhanu06@gmail.com>	2025-12-12 02:27:22 +00:00
jiahanc	0ab23c2b2b	[fix] fix SM check for Flashinfer TRTLLM MOE (#30314 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-12-12 01:00:58 +00:00
Andrew Briand	a00d88973d	[EPLB] Support EPLB w/ NVFP4 (#29804 ) Signed-off-by: Andrew Briand <abriand@nvidia.com> Co-authored-by: Andrew Briand <abriand@nvidia.com>	2025-12-11 22:59:40 +00:00
Wentao Ye	c817b14151	[Perf] Optimize deepgemm experts initialization, 3.9% TTFT improvement (#30494 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: li-jinpeng <3332126450@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-11 17:28:34 -05:00
Nicolò Lucchesi	0efd9f867c	[Core] Whisper Enable Encoder Batching (#29421 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-11 21:06:51 +00:00
Harry Mellor	cf3eacfe58	Standardise `get_rope` to use `rope_parameters["partial_rotary_factor"]`, not `rotary_dim` (#30389 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 20:45:23 +00:00
汪志鹏	0e71eaa644	[Feature] AWQ marlin quantization support for fused moe with lora (#30442 ) Signed-off-by: princepride <wangzhipeng628@gmail.com>	2025-12-11 18:03:32 +00:00
Harry Mellor	8781cd6b88	Add Eagle and Eagle3 support to Transformers modeling backend (#30340 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 17:02:10 +00:00

1 2 3 4 5 ...

3559 Commits