xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-06 19:32:05 +08:00

Author	SHA1	Message	Date
Lukas Geiger	f05d474c8a	[Model][Qwen3VL] Use `mm_position` to compute mrope positions (#28730 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-14 19:45:11 -08:00
Thomas Parnell	e0c910bb89	[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-14 22:55:42 +00:00
Alexander Matveev	e5c78956c0	[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-14 14:13:46 -08:00
Andrey Khalyavin	fd4555089a	[BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728 ) Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>	2025-11-14 10:58:18 -08:00
GuanH	cec275efce	[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure (#28663 ) Signed-off-by: GuanH <guansdrailib@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-14 18:44:27 +00:00
TJian	a425dc256e	[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-14 10:30:50 -08:00
Fardin Hoque	964d65deed	LLaMA4 LoRA Adapter Enablement (#28602 ) Signed-off-by: Fardin Hoque <kfhfar@amazon.com> Co-authored-by: Wei Wei <wwei6@meta.com>	2025-11-14 13:27:56 -05:00
Harry Mellor	5f3cd7f7f2	[Docs] Update the name of `Transformers backend` -> `Transformers modeling backend` (#28725 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 16:34:14 +00:00
dongbo910220	c934caee88	[Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL (#28711 ) Signed-off-by: dongbo910220 <1275604947@qq.com>	2025-11-14 16:07:20 +00:00
Duncan Moss	3f8a874065	[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) (#27134 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-14 08:02:44 -08:00
zhaozx-cn	433c0f8675	[Model] Fix bailing_moe accuracy problem (#28277 ) Signed-off-by: zhaozx-cn <zhaozx2116@163.com>	2025-11-14 13:33:02 +00:00
Shanshan Shen	41b92f7d38	[Model][MM] Extract conv layer as CustomOp (#28455 ) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-14 19:16:13 +08:00
Jiangyun Zhu	c36bcfe6b3	[Bugfix] fix dots.ocr pp support (#28705 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-14 09:01:26 +00:00
haoyangli-amd	0b25498990	[Misc] add ignore mapper for quark quantization (#28275 ) Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>	2025-11-14 05:56:35 +00:00
Hank_	4d5943bda6	[quantization][config] enable override existing quant_config (#28510 ) Signed-off-by: Hank <hcc.mayday@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-14 01:24:10 +00:00
Varun Sundar Rabindranath	fe1cd7704d	[Performance][B200] silu_mul_quant: pack scales in int32 (#28358 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-13 10:16:55 -08:00
Yuanping Song	3035d1a166	[BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy path (#28617 ) Signed-off-by: Yuanping Song <yuanping.song@outlook.com>	2025-11-13 15:24:35 +00:00
zofia	c47b6c85ac	[XPU] add sym params to IPEXConfig (#28611 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>	2025-11-13 11:35:04 +00:00
Zijing Liu	5e973209aa	[BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter (#28603 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>	2025-11-13 11:30:04 +00:00
Jiangyun Zhu	fa183e9271	[Bugfix] fix kimi-linear crash (#28445 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-13 07:59:58 +00:00
Lucia Fang	7e082bc14e	Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 (#28574 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-11-12 21:40:45 -08:00
Harry Mellor	97d1c99302	Rename clashing method names for vLLM model protocol (#27583 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:14:33 -08:00
wangxiyuan	2dacd57394	[platform] Move get_cu_count to utils (#27005 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-13 08:48:47 +08:00
Harry Mellor	51c599f0ec	Skip models that cannot currently init on Transformers v5 (#28471 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 23:43:57 +00:00
Alexander Matveev	69d0e90313	[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 23:37:24 +00:00
vllmellm	d8140b9833	[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel naming for consistency in `_aiter_ops.py` (#28464 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-12 21:46:57 +00:00
Varun Sundar Rabindranath	74a9a9faad	[Performance][B200] Fix deepgemm prologue (#27897 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-12 13:13:03 -08:00
PerryZhang01	a1e7fa362a	[EPLB][ROCm]: support EPBL for ROCm backend (#27731 ) Signed-off-by: Perry Zhang <perzhang@amd.com> Co-authored-by: Perry Zhang <perzhang@amd.com>	2025-11-12 18:16:35 +00:00
Canlin Guo	bc5bd45c7d	[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL (#28271 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-12 15:56:47 +00:00
Alexander Matveev	f76e85c299	[Performance][Hopper] Avoid M dim padding to 4x for most cases (due to cuda graphs paddings) (#28492 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 10:51:43 -05:00
Harry Mellor	54aecd9ed5	Fix pre-commit (and XPU) on `main` (#28556 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 06:13:41 -08:00
Jee Jee Li	a9d18b5107	[Bugfix] Fix gpt_oss packed_modules_mapping (#28536 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-12 21:02:06 +08:00
wuyaoxuehun	d3ade61e42	[Model] fix glm4_moe_mtp load weights with GLM-4.6 checkpoint. (#27597 ) Signed-off-by: wuao.scotty <wuao.scotty@bytedance.com> Co-authored-by: wuao.scotty <wuao.scotty@bytedance.com>	2025-11-12 10:14:00 +00:00
yyzxw	1761dea1a8	[BugFix]: --enable-lora with model granite-4.0-micro crash (#27733 ) Signed-off-by: zxw <1020938856@qq.com>	2025-11-12 09:03:56 +00:00
Lukas Geiger	ac0bb2c307	[Core] Cache `vllm_is_batch_invariant` (#28304 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-12 05:03:01 +00:00
Fanli Lin	b9ce9a3013	[BugFix] Add fallback path in `apply_rotary_pos_emb_flashattn` for non-cuda platforms (#28447 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-12 03:13:21 +00:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Lukas Geiger	cbb799e314	[Model][Qwen3VL] Simplify `get_mrope_input_positions` using numpy (#28302 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-12 02:55:10 +00:00
Michael Goin	e5f599d4d1	[Bugfix] Disable shared expert overlap if Marlin MoE is used (#28410 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-11 23:16:12 +00:00
Jee Jee Li	9d1c474704	[LoRA][1/N]Remove LoRA extra vocab (#28382 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-11 11:06:21 -08:00
Lukas Geiger	76e4dcf225	[Misc] Remove unused attention prefix prefill ops functions (#26971 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-11 18:26:04 +00:00
Fanli Lin	d5edcb8678	[BugFix] Fix Siglip2Attention on XPU (#28448 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-11 18:18:02 +00:00
xuebwang-amd	5a1271d83a	[Quantization] fix attention quantization of gpt_oss model (#27334 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com>	2025-11-11 12:06:00 -05:00
xuebwang-amd	05576df85c	[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Co-authored-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-11 12:05:22 -05:00
zhrrr	68c09efc37	[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-11-11 12:00:31 -05:00
Michael Goin	f9a4087182	Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-11 11:46:04 -05:00
Fanli Lin	b886068056	[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU (#28444 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-11 15:29:33 +00:00
bnellnm	a1448b4b69	[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code (#28064 )	2025-11-11 07:29:02 -07:00
Cyrus Leung	afffd3cc8a	[Model] Pass `mm_features` directly into `get_mrope_input_positions` (#28399 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-11 21:14:48 +08:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00

1 2 3 4 5 ...

3269 Commits