xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-26 21:24:39 +08:00

Author	SHA1	Message	Date
Cyrus Leung	a24ea5414b	[Deprecation] Advance deprecation status (#29617 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-27 19:04:58 +00:00
Cyrus Leung	ee9841daa9	[Bugfix] Fix doc build on main (#29619 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-27 09:08:08 -08:00
Matthew Bonanni	fc1d8be3dc	[Attention] Update attention imports (#29540 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-27 11:19:09 -05:00
Didier Durand	66d3d5422c	[Doc]: fixing typos in diverse files (#29492 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-27 07:15:50 -08:00
Jee Jee Li	2f5f9acd55	[LoRA] Continue optimizing MoE LoRA weight loading (#29322 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-27 05:56:28 -08:00
Roger Wang	cf348c8d27	[Bugfix] Fix HunyuanVL XD-RoPE (#29593 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored by: grider-transwithai <grider@transwith.ai>	2025-11-27 12:36:24 +00:00
Cyrus Leung	00d3310d2d	[Bugfix] Update Ultravox compatibility (#29588 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-27 01:36:18 -08:00
Jinzhen Lin	a67dec7cba	[Bugfix] fix IMA issue in certain cases of the moe marlin kernel (#28619 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-26 19:02:21 -08:00
HDCharles	df01eda4dc	[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878 ) Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>	2025-11-26 21:35:13 -05:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
HDCharles	e603129505	[refactor] CTConfig methods to static/class methods (#28870 ) Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-26 17:21:58 +00:00
Wentao Ye	0b0aa874e8	[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement (#29345 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-26 09:38:52 -07:00
Huamin Li	70d5953f82	Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841 )" (#29483 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-26 22:27:26 +08:00
Yejing Lai	bb706d6048	Fix TeleChatForCausalLM not register issue (#29473 ) Signed-off-by: Lai, Yejing <yejing.lai@intel.com>	2025-11-26 05:15:00 -08:00
Cyrus Leung	e30859dff3	[Bugfix] Fix handling of image embeds in models (#29480 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-26 05:00:15 -08:00
Roger Wang	452a7c9f7c	[Misc] Allow LM only loading for Pixtral (#29451 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-26 05:00:00 -08:00
Xin Yang	53d7f1f601	[Kernel] Use pre-allocated output buffer for triton kernel fused_experts (#29219 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-11-26 10:21:00 +08:00
George D. Torres	56531b79cc	[Misc] Add backup hash algorithm for FIPS constrained environments (#28795 ) Signed-off-by: George D. Torres <gdavtor@gmail.com> Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-11-26 00:50:22 +00:00
Michael Goin	7df0289782	Change warning logs to debug for unimplemented MXFP4 Linear/Attention (#29441 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-25 22:52:31 +00:00
Harry Mellor	0353d2e162	Fix RoPE related failures in Transformers nightly tests (#29333 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 16:23:45 +00:00
Yifan Qiao	48ddb02b79	[Hybrid Allocator] Support KV cache groups with different block_size (#29143 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-25 10:30:57 -05:00
Michael Goin	e502098643	[Kernel] Add NVFP4 MoE CUTLASS support for SM120 (#29242 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-11-25 06:59:07 -08:00
Injae Ryou	794029f012	[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type (#29137 ) Signed-off-by: Injae Ryou <injaeryou@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-25 14:28:53 +00:00
elvischenv	6330f9477d	[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-11-25 07:59:40 +00:00
Fadi Arafeh	98caeadd54	[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei (#29273 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-11-25 15:11:11 +08:00
Isotr0py	92effb07a4	[Model] Add HunyuanOCR support (#29327 ) Signed-off-by: manayang <jackmanayang@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: sergeywang <sergeywang@tencent.com> Co-authored-by: manayang <jackmanayang@gmail.com> Co-authored-by: manayang <manayang@tencent.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-25 03:28:51 +00:00
Michael Goin	6f1355a1b7	[Perf] Disable DeepGEMM MoE by default when TP=8 is used (#29346 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-24 19:01:40 -07:00
Hanjie Qiu	5f9679a43b	[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states (#27688 ) Signed-off-by: hjjq <hanjieq@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-24 20:13:12 -05:00
Wentao Ye	699bca76c0	[UX] Raise error for attn backend of batch invariant (#29348 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-24 17:49:01 -07:00
Michael Goin	c17610e2ba	[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 (#29339 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-24 18:22:46 -05:00
Yan Ma	3cfa63ad99	[XPU]fix Kimi-VL-A3B-thinking on xpu (#29309 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2025-11-24 21:02:21 +00:00
Chenheli Hua	839c6b7b72	[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. (#27721 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-24 19:24:37 +00:00
bnellnm	8f066146c3	[MoE][Refactor] Make select_experts a non-static method (#29067 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-11-24 13:38:04 -05:00
Laith Sakka	7a228b5305	Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-24 10:12:41 -05:00
杰兮	8005e606bf	[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP (#27563 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-11-24 10:16:52 +00:00
Roger Wang	0ff70821c9	[Core] Deprecate `xformers` (#29262 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-24 04:18:55 +00:00
Zero	30854783ad	[Model] Add OpenCUA-7B support (#29068 ) Signed-off-by: lim4349 <rockmanzero@naver.com> Signed-off-by: Zero <rockmanzero@naver.com> Co-authored-by: Cloud User <ubuntu@a100-80g-4.novalocal> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-24 10:27:55 +08:00
Jee Jee Li	1073ba68b0	[LoRA] Optimize 3D MoE logic (#29222 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-24 10:27:23 +08:00
jiahanc	5f96c00c55	[Fix] Add SM check to flashinfer MOE backend (#29144 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-23 00:39:30 +00:00
Federico	f55c76c2b3	chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning (#29240 )	2025-11-22 08:42:48 -08:00
ZiTian Zhao	d84d8f4429	Fix EVS crash when using `video_embeds` inputs in Qwen2.5-VL (#29232 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-22 06:48:59 -08:00
Cyrus Leung	ae66818379	[Misc] Fix pre-commit (#29238 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 06:48:01 -08:00
Bram Wasti	5f7209a793	[tiny] Remove unsupported TRITON_MLA backend from batch invariance (#28832 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-22 21:00:50 +08:00
Nandan Vallamdasu	6965a392a4	Fix: Resolve circular import in model_loader/utils.py (#29189 ) Signed-off-by: nandan2003 <nandan.vallamdasu@outlook.com> Signed-off-by: Nandan Vallamdasu <nandan.vallamdasu@outlook.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-22 04:58:22 -08:00
jinghanhu	988ee66b0d	Handle triton kernel import exception (#29062 )	2025-11-22 10:07:50 +00:00
FlintyLemming	052950e5b3	Add fused MoE config for H200 E160 N192 fp8 (#29182 ) Signed-off-by: FlintyLemming <admin@flinty.moe>	2025-11-21 17:37:51 -08:00
Lukas Geiger	d045e22dfe	[Model][Qwen3VL] Tune Triton w8a8 block fp8 kernel for L40s (#29217 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-21 17:30:55 -08:00
Varun Sundar Rabindranath	3137991f55	[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor (#29162 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-21 14:28:17 -08:00
Julien Denize	57430fc95c	Default model load/config/tokenizer to `mistral` format if relevant files exist (#28659 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 13:58:59 -08:00
Ning Xie	53a1ba6ec5	[log] add weights loading time log to sharded_state loader (#28628 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-21 21:06:09 +00:00

1 2 3 4 5 ...

3389 Commits