xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-20 08:25:01 +08:00

Author	SHA1	Message	Date
Harry Mellor	97d1c99302	Rename clashing method names for vLLM model protocol (#27583 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:14:33 -08:00
Harry Mellor	51c599f0ec	Skip models that cannot currently init on Transformers v5 (#28471 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 23:43:57 +00:00
Canlin Guo	bc5bd45c7d	[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL (#28271 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-12 15:56:47 +00:00
Jee Jee Li	a9d18b5107	[Bugfix] Fix gpt_oss packed_modules_mapping (#28536 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-12 21:02:06 +08:00
wuyaoxuehun	d3ade61e42	[Model] fix glm4_moe_mtp load weights with GLM-4.6 checkpoint. (#27597 ) Signed-off-by: wuao.scotty <wuao.scotty@bytedance.com> Co-authored-by: wuao.scotty <wuao.scotty@bytedance.com>	2025-11-12 10:14:00 +00:00
yyzxw	1761dea1a8	[BugFix]: --enable-lora with model granite-4.0-micro crash (#27733 ) Signed-off-by: zxw <1020938856@qq.com>	2025-11-12 09:03:56 +00:00
Fanli Lin	b9ce9a3013	[BugFix] Add fallback path in `apply_rotary_pos_emb_flashattn` for non-cuda platforms (#28447 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-12 03:13:21 +00:00
Lukas Geiger	cbb799e314	[Model][Qwen3VL] Simplify `get_mrope_input_positions` using numpy (#28302 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-12 02:55:10 +00:00
Jee Jee Li	9d1c474704	[LoRA][1/N]Remove LoRA extra vocab (#28382 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-11 11:06:21 -08:00
Fanli Lin	d5edcb8678	[BugFix] Fix Siglip2Attention on XPU (#28448 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-11 18:18:02 +00:00
xuebwang-amd	5a1271d83a	[Quantization] fix attention quantization of gpt_oss model (#27334 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com>	2025-11-11 12:06:00 -05:00
Fanli Lin	b886068056	[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU (#28444 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-11 15:29:33 +00:00
Cyrus Leung	afffd3cc8a	[Model] Pass `mm_features` directly into `get_mrope_input_positions` (#28399 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-11 21:14:48 +08:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Lukas Geiger	9973e6e04a	[Model][Qwen3VL] Slighly speedup `fast_pos_embed_interpolate` (#28434 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-11 10:35:10 +00:00
Fanli Lin	c7991269dd	[BugFix] 'DeepseekV2Config' object has no attribute 'use_mla'` (#28387 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-11 08:45:38 +00:00
Jiangyun Zhu	f0359fffa4	[Bugfix] fix qwen3-next crash (#28202 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-11 08:24:28 +00:00
Roger Wang	4fd4b743a2	[Bugfix] Fix max image size for PaddleOCR-VL (#28442 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-11 08:07:24 +00:00
jiahanc	34553b9d27	[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 12:34:57 -05:00
Cyrus Leung	d0e186c16f	[V0 Deprecation] Remove unused `context_len` and `seq_len` from M-RoPE (#28395 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-11 00:30:06 +08:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Ferrebo	912744d066	[Fix] optimize visual token mask with caching and multi-token support (#28374 ) Signed-off-by: Ferrebo <itachi971009@gmail.com> Signed-off-by: kebo01 <kebo01@baidu.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-10 13:23:49 +00:00
Yu Jiaqi	15be507c86	[bugfix] fix siglip batch text output error (#28365 ) Signed-off-by: piood <2477084691@qq.com>	2025-11-10 21:21:15 +08:00
Jiangyun Zhu	c4768dcf47	[Kernel] Fix fused_gdn_gating (#28343 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-09 14:26:35 -07:00
Jiangyun Zhu	7ae5a5fb11	[Misc] Add some comments in qwen3-next (#28267 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-08 23:59:24 -08:00
Mohammad Miadh Angkad	404d7a9d14	[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 (#28345 ) Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>	2025-11-08 15:50:10 -07:00
Isotr0py	934a9c3b79	[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 (#28101 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 05:01:27 +00:00
Lukas Geiger	e0919f331d	[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (#28168 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-07 12:14:29 +00:00
Kevin H. Luu	8e19d470af	[fix] Revert "fixing mm placeholder replacement issue with gemma3" (#28285 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-11-07 12:09:09 +00:00
Mengqing Cao	1958bda9b4	[Misc][Model][Refactor] Pass the prefix into Linear layers (#28259 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2025-11-07 19:38:38 +08:00
Harry Mellor	c0a4b95d64	Fix issues from #28242 (#28257 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-07 04:23:17 +00:00
Lucas Kabela	4bf56c79cc	[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-11-07 00:16:03 +00:00
Julien Denize	7a8375f8a0	Add llama 4 scaling support (#28145 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-11-06 18:55:17 +00:00
Seungduk Kim	201dc98acc	Fix hard-coded parameter name in gemma3n.py (#27946 ) Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com> Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-05 23:07:36 -08:00
Isotr0py	43ecd0a900	[Chore] Clean up deepseek v2/v3 config copy (#28055 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-06 03:46:30 +00:00
Vadim Gimpelson	b6a248bdd7	[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-05 17:01:12 -08:00
wang.yuqi	802748bddb	[Bugfix] Fix Qwen3-Reranker-8B load (#28117 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-11-05 18:33:50 +00:00
Chen Zhang	c765f0b443	[FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell (#27994 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-11-05 09:25:32 -08:00
Jiangyun Zhu	c18f88c6ca	[Kernel] Fuse computation of g and beta for Gated Delta Net (#28095 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-05 09:14:55 -08:00
Isotr0py	3f5a4b6473	[Bugfix] Validate custom logits processor xargs for online serving (#27560 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-05 16:53:33 +00:00
Ilya Markov	e50c454672	[BugFix] Support EP/DP + EPLB with MTP (#25311 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-11-05 15:22:17 +00:00
Alex Brooks	b7cbc25416	[Model, Core] Support Granite Speech & LoRA for STT (#24455 )	2025-11-05 08:33:48 +01:00
Isotr0py	0ff05e3770	[Bugfix] Fix encoder-only model support for transformers backend (#28021 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-04 22:24:41 -08:00
wangxiyuan	428bc7bf1c	[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-04 20:51:16 -08:00
Kunshang Ji	18b39828d9	[XPU] Add gpt-oss model support for Intel GPU (#27786 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-05 02:17:23 +00:00
Vadim Gimpelson	d4e547bb7e	Revert "[PERF] Decouple projections from GDN custom op" (#28080 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-04 15:58:23 -08:00
Aleksandr Malyshev	2d977a7a9e	[ROCm] gemm_a16w16 upstreaming (#26969 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-11-04 16:01:00 -05:00
yt0428	05cae69f0f	[model] Add support for openPangu_Ultra_MoE (#27521 ) Signed-off-by: yuantao <2422264527@qq.com> Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-04 08:17:20 -08:00
Vadim Gimpelson	5fd8f02ea9	[PERF] Decouple projections from GDN custom op (#27512 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-04 08:11:41 -08:00
tomeras91	77f8001f53	[Model][Bugfix] fix pipeline parallelism support for NemotronH (#27968 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-04 12:28:36 +00:00

1 2 3 4 5 ...

1827 Commits