xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-14 06:14:34 +08:00

Author	SHA1	Message	Date
elvischenv	6330f9477d	[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-11-25 07:59:40 +00:00
Fadi Arafeh	98caeadd54	[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei (#29273 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-11-25 15:11:11 +08:00
Isotr0py	92effb07a4	[Model] Add HunyuanOCR support (#29327 ) Signed-off-by: manayang <jackmanayang@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: sergeywang <sergeywang@tencent.com> Co-authored-by: manayang <jackmanayang@gmail.com> Co-authored-by: manayang <manayang@tencent.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-25 03:28:51 +00:00
Michael Goin	6f1355a1b7	[Perf] Disable DeepGEMM MoE by default when TP=8 is used (#29346 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-24 19:01:40 -07:00
Hanjie Qiu	5f9679a43b	[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states (#27688 ) Signed-off-by: hjjq <hanjieq@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-24 20:13:12 -05:00
Wentao Ye	699bca76c0	[UX] Raise error for attn backend of batch invariant (#29348 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-24 17:49:01 -07:00
Michael Goin	c17610e2ba	[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 (#29339 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-24 18:22:46 -05:00
Yan Ma	3cfa63ad99	[XPU]fix Kimi-VL-A3B-thinking on xpu (#29309 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2025-11-24 21:02:21 +00:00
Chenheli Hua	839c6b7b72	[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. (#27721 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-24 19:24:37 +00:00
bnellnm	8f066146c3	[MoE][Refactor] Make select_experts a non-static method (#29067 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-11-24 13:38:04 -05:00
Laith Sakka	7a228b5305	Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-24 10:12:41 -05:00
杰兮	8005e606bf	[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP (#27563 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-11-24 10:16:52 +00:00
Roger Wang	0ff70821c9	[Core] Deprecate `xformers` (#29262 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-24 04:18:55 +00:00
Zero	30854783ad	[Model] Add OpenCUA-7B support (#29068 ) Signed-off-by: lim4349 <rockmanzero@naver.com> Signed-off-by: Zero <rockmanzero@naver.com> Co-authored-by: Cloud User <ubuntu@a100-80g-4.novalocal> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-24 10:27:55 +08:00
Jee Jee Li	1073ba68b0	[LoRA] Optimize 3D MoE logic (#29222 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-24 10:27:23 +08:00
jiahanc	5f96c00c55	[Fix] Add SM check to flashinfer MOE backend (#29144 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-23 00:39:30 +00:00
Federico	f55c76c2b3	chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning (#29240 )	2025-11-22 08:42:48 -08:00
ZiTian Zhao	d84d8f4429	Fix EVS crash when using `video_embeds` inputs in Qwen2.5-VL (#29232 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-22 06:48:59 -08:00
Cyrus Leung	ae66818379	[Misc] Fix pre-commit (#29238 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 06:48:01 -08:00
Bram Wasti	5f7209a793	[tiny] Remove unsupported TRITON_MLA backend from batch invariance (#28832 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-22 21:00:50 +08:00
Nandan Vallamdasu	6965a392a4	Fix: Resolve circular import in model_loader/utils.py (#29189 ) Signed-off-by: nandan2003 <nandan.vallamdasu@outlook.com> Signed-off-by: Nandan Vallamdasu <nandan.vallamdasu@outlook.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-22 04:58:22 -08:00
jinghanhu	988ee66b0d	Handle triton kernel import exception (#29062 )	2025-11-22 10:07:50 +00:00
FlintyLemming	052950e5b3	Add fused MoE config for H200 E160 N192 fp8 (#29182 ) Signed-off-by: FlintyLemming <admin@flinty.moe>	2025-11-21 17:37:51 -08:00
Lukas Geiger	d045e22dfe	[Model][Qwen3VL] Tune Triton w8a8 block fp8 kernel for L40s (#29217 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-21 17:30:55 -08:00
Varun Sundar Rabindranath	3137991f55	[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor (#29162 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-21 14:28:17 -08:00
Julien Denize	57430fc95c	Default model load/config/tokenizer to `mistral` format if relevant files exist (#28659 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 13:58:59 -08:00
Ning Xie	53a1ba6ec5	[log] add weights loading time log to sharded_state loader (#28628 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-21 21:06:09 +00:00
Lucas Wilkinson	1840c5cb18	[BugFix] Make sure to allocate worst case MoE workspace during profile run in the DP + EP case (#27426 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-21 11:41:52 -08:00
Mingyuan Ma	b4c8fbaae2	Add TRTLLM MoE NVFP4 kernel to CompressedTensorsW4A4MoeMethod (#28892 ) Signed-off-by: mingyuanm <mingyuanm@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 09:54:11 -07:00
rasmith	e99e467384	[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py (#29132 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-21 11:53:09 -05:00
Wentao Ye	a42ab317ac	[Log] Optimize startup log (#28948 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-21 08:46:20 -08:00
Aleksandr Malyshev	b7f1f490a6	Upstream triton fp4 weight preshuffle (#28888 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-11-21 11:34:46 -05:00
Russell Bryant	cca2d2cdbe	[Core] Align whisper closer to other multimodal models (#27292 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-11-21 12:01:54 +00:00
Cyrus Leung	aab0102a26	[V0 deprecation] Remove more V0 references (#29088 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:56:59 +00:00
Huamin Li	8ac3a41487	[CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers (#29111 ) Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 23:53:30 -08:00
Cyrus Leung	0e741c12e3	[Bugfix] Fix Plamo3 rope handling (#29092 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-21 11:38:35 +08:00
Wentao Ye	56669c1f29	[CI] Fix mypy for `vllm/v1/worker` (#29037 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 11:36:07 +08:00
Hongxia Yang	3f5f36da3f	[ROCm] Fix for import when building with upstream triton for gfx1100 for gpt-oss serving (#29127 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2025-11-21 03:30:07 +00:00
Wentao Ye	e1eefa4c40	[Bug] Fix torch warning of tf32 usage (#29112 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 01:54:59 +00:00
Jee Jee Li	9875be6431	[LoRA][2/2]Remove LoRA extra vocab (#28545 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-21 09:46:43 +08:00
Wentao Ye	df44df0143	[Feature] Shared Experts Overlap with FI deepgemm swap kernel, 2.2% throughput improvement and 3.6% TTFT improvement (#28879 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 18:41:49 -07:00
Fanli Lin	a2e9ebe9e2	[BugFix] Fix flash_attn import in `siglip2navit.py` (#29082 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-11-20 12:14:29 +00:00
Zhewen Li	93c8672ceb	[Bugfix] Fix spec decode memory regression after #28549 (#28819 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-20 19:05:50 +08:00
Shinichi Hemmi	c9e093116c	[MODEL] Implement plamo3 (#28834 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>	2025-11-20 03:00:19 -08:00
Pleaplusone	06c20c9904	[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 02:54:01 -08:00
Anna Shors	6eb745d9bd	Add truncate arg to yarn to match openai implementation of gpt-oss (#28244 ) Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-20 18:53:50 +08:00
Dezhan	dc45efc8ef	[BugFix] Fix Llama4 Pipeline Parallelism Assert Error (#28577 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2025-11-20 02:52:36 -08:00
Wentao Ye	2c52c7fd9a	[Bug] Fix torch dynamo warning Dynamo detected a call to a `functools.lru_cache` (#29038 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 16:52:23 +08:00
Pleaplusone	7218f83992	[ROCm][BugFix] Fix shared expert loading error when disable `VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS` (#28633 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 14:50:23 +07:00
Lukas Geiger	a9705a290a	[Model][QwenVL] Replace `torch.repeat_interleave` with faster `np.repeat` (#28964 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-19 22:04:23 -08:00

1 2 3 4 5 ...

3366 Commits