xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-28 16:37:07 +08:00

Author	SHA1	Message	Date
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
Jerry Zhang	da94c7c0eb	Move online quantization to `model.load_weights` (#26327 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-11-18 16:52:41 -08:00
tomeras91	1395461f5f	[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-18 16:49:36 -08:00
Varun Sundar Rabindranath	9912b8ccb8	[Build] Add OpenAI triton_kernels (#28788 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-18 16:45:20 -08:00
Isotr0py	e4bb2684bc	[Models] Replace all `nn.Conv2d` with vLLM's Conv2dLayer (#28842 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 18:56:04 +00:00
Luciano Martins	c2612371ad	[Model] Add Gemma3 GGUF multimodal support (#27772 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 08:56:29 -08:00
Canlin Guo	b9489f51e1	[Model][Perf] Use cos and sin cache in QwenVL (#28798 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-18 11:51:54 +00:00
Ning Xie	0168f69e50	[Misc] Remove unnecessary parentheses from log statements (#28897 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-17 20:33:46 -08:00
Wentao Ye	3ddcf46011	[Refactor] Remove Unused Func in Batch Invariant (#28881 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-17 20:29:29 -08:00
xuebwang-amd	d0a73620cc	[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss (#28638 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 11:16:45 +08:00
Pranav	f77bce001a	[Model] Add Afmoe architecture implementation (#28332 ) Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> Signed-off-by: Pranav <veldurthipranav@gmail.com> Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>	2025-11-17 15:11:20 -08:00
Shreyas Kulkarni	95ae50b7d1	[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435 ) Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>	2025-11-17 15:01:34 -08:00
Zhewen Li	f8b19c0ffd	[Bugfix] Fix GPT-OSS on AMD after #28603 (#28816 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-17 13:15:26 -05:00
wuyaoxuehun	ab01cd14e5	[BugFix] Fix glm4_moe_mtp load weights bug (#28805 ) Signed-off-by: wuyaoxuehun <798143193@qq.com>	2025-11-17 17:13:11 +08:00
jiahanc	561253b37f	[Performance][Fix] update nvfp4 code to support renorm routing (#28569 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-16 18:02:42 -08:00
amirkl94	03ee48111d	Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261 )	2025-11-16 13:39:44 -05:00
Lukas Geiger	5a87076d6e	[Model][QwenVL] Optimize `Qwen2_5_VisionAttention` q,k preparation (#28769 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-16 17:37:15 +00:00
Anna Shors	8d259fad6c	Fix gpt oss weight loading with EP + bf16 (#28765 ) Signed-off-by: ashors1 <ashors@nvidia.com>	2025-11-16 13:12:45 +00:00
Dezhan	af02c40970	Fixed gpt-oss _load_weights_other() parameter position bug (#28715 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2025-11-16 09:46:29 +00:00
Lukas Geiger	07cadab27a	[Model][Qwen3VL] Cache positional embedding indices (#28475 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-15 19:03:09 +00:00
Eldar Kurtić	e439c784fa	Add support for Eagle with separate lm-head and embed_tokens layers (#28549 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-15 06:12:02 -08:00
hwhaokun	085a525332	[Model] Fix lmhead init bug of bailing_moe (#28777 ) Signed-off-by: hwhaokun <haokun0405@163.com> Co-authored-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-15 05:44:12 -08:00
tingtinggithub	cb15ee28db	Allow Gemma3 to take image embeddings (#28483 ) Signed-off-by: tingtinggithub <streamttt@gmail.com>	2025-11-15 04:18:08 -08:00
Zhewen Li	1ec978c209	[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 (#28709 ) Signed-off-by: Zhewen Li <zhewenli@meta.com>	2025-11-15 01:10:48 -08:00
Varun Sundar Rabindranath	6965ef436f	[Performance][DeepGEMM] Estimate expected_m (#28694 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-15 13:52:14 +08:00
Lukas Geiger	f05d474c8a	[Model][Qwen3VL] Use `mm_position` to compute mrope positions (#28730 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-14 19:45:11 -08:00
Thomas Parnell	e0c910bb89	[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-14 22:55:42 +00:00
Alexander Matveev	e5c78956c0	[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-14 14:13:46 -08:00
Andrey Khalyavin	fd4555089a	[BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728 ) Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>	2025-11-14 10:58:18 -08:00
GuanH	cec275efce	[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure (#28663 ) Signed-off-by: GuanH <guansdrailib@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-14 18:44:27 +00:00
TJian	a425dc256e	[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-14 10:30:50 -08:00
Fardin Hoque	964d65deed	LLaMA4 LoRA Adapter Enablement (#28602 ) Signed-off-by: Fardin Hoque <kfhfar@amazon.com> Co-authored-by: Wei Wei <wwei6@meta.com>	2025-11-14 13:27:56 -05:00
Harry Mellor	5f3cd7f7f2	[Docs] Update the name of `Transformers backend` -> `Transformers modeling backend` (#28725 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 16:34:14 +00:00
dongbo910220	c934caee88	[Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL (#28711 ) Signed-off-by: dongbo910220 <1275604947@qq.com>	2025-11-14 16:07:20 +00:00
Duncan Moss	3f8a874065	[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) (#27134 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-14 08:02:44 -08:00
zhaozx-cn	433c0f8675	[Model] Fix bailing_moe accuracy problem (#28277 ) Signed-off-by: zhaozx-cn <zhaozx2116@163.com>	2025-11-14 13:33:02 +00:00
Shanshan Shen	41b92f7d38	[Model][MM] Extract conv layer as CustomOp (#28455 ) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-14 19:16:13 +08:00
Jiangyun Zhu	c36bcfe6b3	[Bugfix] fix dots.ocr pp support (#28705 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-14 09:01:26 +00:00
haoyangli-amd	0b25498990	[Misc] add ignore mapper for quark quantization (#28275 ) Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>	2025-11-14 05:56:35 +00:00
Hank_	4d5943bda6	[quantization][config] enable override existing quant_config (#28510 ) Signed-off-by: Hank <hcc.mayday@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-14 01:24:10 +00:00
Varun Sundar Rabindranath	fe1cd7704d	[Performance][B200] silu_mul_quant: pack scales in int32 (#28358 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-13 10:16:55 -08:00
Yuanping Song	3035d1a166	[BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy path (#28617 ) Signed-off-by: Yuanping Song <yuanping.song@outlook.com>	2025-11-13 15:24:35 +00:00
zofia	c47b6c85ac	[XPU] add sym params to IPEXConfig (#28611 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>	2025-11-13 11:35:04 +00:00
Zijing Liu	5e973209aa	[BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter (#28603 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>	2025-11-13 11:30:04 +00:00
Jiangyun Zhu	fa183e9271	[Bugfix] fix kimi-linear crash (#28445 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-13 07:59:58 +00:00
Lucia Fang	7e082bc14e	Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 (#28574 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-11-12 21:40:45 -08:00
Harry Mellor	97d1c99302	Rename clashing method names for vLLM model protocol (#27583 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:14:33 -08:00
wangxiyuan	2dacd57394	[platform] Move get_cu_count to utils (#27005 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-13 08:48:47 +08:00
Harry Mellor	51c599f0ec	Skip models that cannot currently init on Transformers v5 (#28471 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 23:43:57 +00:00
Alexander Matveev	69d0e90313	[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 23:37:24 +00:00

1 2 3 4 5 ...

3294 Commits