xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-28 12:17:12 +08:00

Author	SHA1	Message	Date
Yan Ma	7e2729b57e	[Multimodal][XPU]Enable vision attn backend for xpu platform (#27525 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yejing Lai <yejing.lai@intel.com> Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-01 04:45:02 +00:00
Jee Jee Li	3a5de7d2d6	[Bugfix] Fix KDA output (#27905 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-01 11:54:36 +08:00
Jee Jee Li	bc4486d609	[Kernel] Enable FusedMoEModularKernel support bias (#27754 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-01 02:05:12 +00:00
Shu Wang	fc16f1c477	Flashinfer_CUTLASS_MOE fuses quantization for TP (#27223 ) Signed-off-by: Shu Wang. <shuw@nvidia.com>	2025-10-31 17:54:29 +00:00
ZiTian Zhao	bc306fe5e9	fix incorrect type annotation in KimiMLP (#27885 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>	2025-10-31 17:38:02 +00:00
Isotr0py	7e06c40e63	[Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V (#27860 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-31 17:04:51 +00:00
Jiangyun Zhu	3857eb8725	[Perf] Decouple torch op from GDA to leverage torch.compile (#27871 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-31 21:35:52 +08:00
toncao	e5ef4dfc11	[Kimi-Linear] Correct prefixes and add compatibility to AWQ quants (#27834 ) Signed-off-by: toncao <cpatonn@gmail.com> Co-authored-by: toncao <cpatonn@gmail.com>	2025-10-31 17:36:37 +08:00
Paul Zhang	e7acb20076	[Feature] Batch invariant torch.compile (#27660 ) Signed-off-by: PaulZhang12 <paulzhan@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-30 13:11:29 -07:00
Tyler Michael Smith	ab98f6556f	[Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) (#27811 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-10-30 11:52:18 -07:00
Roger Meier	2918c1b49c	[Model] Use the same fused_moe configs for all H200 devices (#23642 ) Signed-off-by: Roger Meier <r.meier@siemens.com>	2025-10-30 17:36:56 +00:00
Mengqing Cao	1004205795	[MTP] Refactor mtp predictor to avoid d2h operation (#27643 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2025-10-30 17:27:39 +00:00
Varun Sundar Rabindranath	e5e076cad7	[BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP (#27762 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-30 08:24:31 -07:00
Li, Jiang	eebf00cb0c	[Bugfix][CPU] Fix MRoPE dispatch on the CPU backend (#27800 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-10-30 15:12:05 +00:00
Fan Yin	9956aae4ea	[Model][Ouro] Support Ouro Model (#27794 ) Signed-off-by: yinfan.1024 <yinfan.1024@bytedance.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-30 22:34:41 +08:00
Zhiyuan Li	4e68cc9b6a	[Model] Introduce Kimi Linear to vLLM (#27809 ) Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn> Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>	2025-10-30 21:02:27 +08:00
wang.yuqi	4464723f22	[Frontend][Doc][5/N] Improve all pooling task \| Polish encode (pooling) api & Document. (#25524 ) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-10-30 12:13:05 +00:00
Zhewen Li	e806178d2a	[BugFix][VL] Fix FA selection on Qwen2.5-VL (#27790 ) Signed-off-by: zhewenli <zhewenli@meta.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-30 07:54:44 +00:00
Bram Wasti	ded8ada86a	Add more dims for batch invariant shims (#27489 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-30 05:28:45 +00:00
Benjamin Bartels	17d055f527	[Feat] Adds runai distributed streamer (#27230 ) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev> Co-authored-by: omer-dayan <omdayan@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-29 21:09:10 -07:00
Yan Ma	b798e39f93	[XPU][bugfix] fix rope for llama4 and deepseek (#25145 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2025-10-30 09:43:13 +08:00
Chenheli Hua	48eb8eba58	[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. (#27760 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-29 23:17:48 +00:00
Wentao Ye	b5d90f7400	[Bug] Fix DBO IMA issue for DeepEPHT (#27666 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-29 16:28:27 -04:00
Wentao Ye	fcb1d570bb	[Bug] Fix DeepEP low latency `assert self.batched_router_logits.size(-1) == full_router_logits.size(-1)` Bug (#27682 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-29 14:50:39 -04:00
JartX	7568a282b9	[FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA (#27744 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-29 16:55:35 +00:00
Roger Young	d6704dd099	Fix MiniMax-M2 rmsnorm precision and remove useless code (#27627 ) Signed-off-by: xuebi <xuebi@minimaxi.com> Co-authored-by: xuebi <xuebi@minimaxi.com>	2025-10-29 21:01:05 +08:00
Jiangyun Zhu	8df98c2161	[perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next (#27578 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-29 08:12:54 +00:00
Zhewen Li	8b62495076	[Bugfix] Fix non-contiguous tensor error in `rocm_unquantized_gemm_impl` (#27605 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-29 00:00:15 -07:00
Lukas Geiger	0d8161b075	[Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes (#27705 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-29 05:28:20 +00:00
Lucas Kabela	94666612a9	[Misc][qwen2_5_vl][torch.compile] Enable `supports_torch_compile` on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>	2025-10-28 22:36:43 +00:00
Wentao Ye	6afc28a9ba	[Test] Batch Invariant: Unit test using parameterized backend (#27478 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-28 13:51:35 -07:00
Zhiyuan Li	e88bdd60d9	[FLA] Introduce Kimi Delta Attention(KDA) to VLLM (#27654 ) Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>	2025-10-28 22:56:28 +08:00
Asaf Joseph Gardin	05181cc57f	[Hybrid] Add mamba_block_size to Engine Args (#27289 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-10-28 12:54:24 +00:00
Wentao Ye	0484b64248	[Bug] Fix shape issue for eplb expert weights (#27589 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-28 20:44:05 +08:00
Matthew Bonanni	44b5ce956d	[Bugfix] In LongRoPE, decide short vs long based on max_model_len (#27431 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-28 12:00:56 +00:00
Li, Jiang	d34f5fe939	[Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms (#27526 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-27 23:25:44 -07:00
Eric Yue	bdb01a38fe	[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X (#27323 ) Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>	2025-10-27 22:58:06 -07:00
tingtinggithub	23ad820553	fixing mm placeholder replacement issue with gemma3 (#27538 ) Signed-off-by: tingtingtang1992 <streamttt@gmail.com>	2025-10-27 14:34:01 +00:00
Varun Sundar Rabindranath	5d3be3ba4c	[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement (#27487 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-27 07:32:50 -07:00
Yu Jiaqi	4f882be4a0	[Model] Siglip2 Model Support (#27566 ) Signed-off-by: piood <2477084691@qq.com>	2025-10-27 06:57:37 -07:00
Asaf Joseph Gardin	9273754222	[Hybrid] Added supports_mamba_prefix_caching Protocol (#27339 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-10-27 13:05:20 +00:00
Cyrus Leung	7c2bdb83dc	[Misc] Clean up utils (#27552 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-27 09:05:40 +00:00
Danielle Robinson	9932ed6a83	[Kernel] Adding split_K implementation for fused_moe_lora (#27291 ) Signed-off-by: Danielle Robinson <dmmaddix@amazon.com> Signed-off-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Danielle Robinson <dmmaddix@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-27 02:05:24 -07:00
Jee Jee Li	2d631d28c6	[Doc] Slight improvement to M2 and beyond (#27554 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-27 09:02:10 +00:00
Cyrus Leung	cbd5e07a51	[Model] Use merge_by_field_config for MM models (Qwen series) (#27546 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-27 05:38:05 +00:00
CSWYF3634076	63b22e0dbb	[Model][Bugfix] fix ernie45 moe 300B SharedFusedMoE output tuple (#27316 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-10-26 20:53:31 -07:00
Roger Young	5980604c44	Fix MiniMax-M2 copyright (#27537 ) Signed-off-by: xuebi <xuebi@minimaxi.com> Co-authored-by: xuebi <xuebi@minimaxi.com>	2025-10-27 03:29:51 +00:00
Roger Young	720af6ab79	[Model][MiniMax-M2] Support MiniMax-M2 Model (#27535 ) Signed-off-by: xuebi <xuebi@minimaxi.com> Co-authored-by: xuebi <xuebi@minimaxi.com>	2025-10-27 00:59:11 +08:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
JartX	65d2cf9511	[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-10-26 15:08:52 +08:00

... 2 3 4 5 6 ...

3294 Commits