xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-24 00:15:01 +08:00

Author	SHA1	Message	Date
jiahanc	34553b9d27	[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 12:34:57 -05:00
Varun Sundar Rabindranath	b039bfda8f	[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-10 09:21:52 -08:00
Cyrus Leung	d0e186c16f	[V0 Deprecation] Remove unused `context_len` and `seq_len` from M-RoPE (#28395 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-11 00:30:06 +08:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
caozuoba	40e2eeeb92	[Kernel] Optimization of the mm_k operator. (#28280 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-10 16:03:46 +00:00
zejunchen-zejun	b06b9470ca	[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model (#27474 ) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>	2025-11-10 10:38:56 -05:00
TJian	4673e465ff	Add @tjtanaa to codeowner for ROCm and multi-modal (#28360 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-10 21:39:17 +08:00
Ferrebo	912744d066	[Fix] optimize visual token mask with caching and multi-token support (#28374 ) Signed-off-by: Ferrebo <itachi971009@gmail.com> Signed-off-by: kebo01 <kebo01@baidu.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-10 13:23:49 +00:00
Yu Jiaqi	15be507c86	[bugfix] fix siglip batch text output error (#28365 ) Signed-off-by: piood <2477084691@qq.com>	2025-11-10 21:21:15 +08:00
Mark McLoughlin	6f7de33bed	[Metrics] Refactor LoRA state tracking (#26801 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 16:34:36 +08:00
Shinichi Hemmi	a98cc35c34	Restore PlaMo2 unit test as `pfnet/plamo-2-1b` now supports `transformers >=4.56` (#28019 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>	2025-11-10 06:50:02 +00:00
Lucas Wilkinson	e8697faf03	[V0 deprecation] Remove no longer used `get_metadata_cls` (#28370 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-10 14:32:09 +08:00
Xiake Sun	03fa4d3fb3	[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X (#28373 ) Signed-off-by: Xiake Sun <xiake.sun@amd.com> Signed-off-by: Xiake Sun <xisun@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-10 04:53:40 +00:00
Varun Sundar Rabindranath	6b2b9fd934	[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness (#28322 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-10 10:45:29 +08:00
JartX	c5f685b3ae	[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP (#28279 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2025-11-09 23:09:36 +00:00
Jiangyun Zhu	c4768dcf47	[Kernel] Fix fused_gdn_gating (#28343 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-09 14:26:35 -07:00
Zhewen Li	a65a934ebe	[CI/Build] Temporary fix to LM Eval Small Models (#28324 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-09 21:08:38 +00:00
usberkeley	4a8d6bd168	Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method (#28214 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-11-09 19:11:46 +00:00
Lucas Wilkinson	636efd10a5	[Core] Separate out attention metadata building logic from prepare inputs (#26764 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-09 13:51:43 -05:00
Nick Hill	289eb6c537	[Core] Simplify async KV output aggregation (#28327 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-09 09:44:13 -08:00
Nicolò Lucchesi	19d91ece4b	[CI] Fix flaky `test_eagle_correctness` test (#28364 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-09 16:04:59 +00:00
Jiangyun Zhu	7ae5a5fb11	[Misc] Add some comments in qwen3-next (#28267 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-08 23:59:24 -08:00
Yong Hoon Shin	de2b78305f	[ROCm] Add env to enable/disable aiter triton gemm (#28321 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-08 22:27:00 -08:00
Ning Xie	e5e9067e61	[Misc] fix typo and add detailed log (#28178 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-09 05:33:46 +00:00
yihong	3a7d580343	fix: close issue 28338 by fixed python version (#28339 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-11-09 05:07:26 +00:00
Kevin H. Luu	05f8d69077	[chore] Move some wikimedia images to S3 (#28351 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-11-09 01:58:26 +00:00
Mohammad Miadh Angkad	404d7a9d14	[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 (#28345 ) Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>	2025-11-08 15:50:10 -07:00
ElizaWszola	171133f929	[Bugfix] Fix test fused quant layernorm tests (#27865 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-08 14:31:33 -08:00
Cole Murray	32787d0644	Remove setuptools upper bound constraint (<80) (#28337 ) Signed-off-by: Cole Murray <colemurray.cs@gmail.com>	2025-11-08 22:30:18 +00:00
Benjamin Chislett	975676d174	[Feat] Drop-in Torch CUDA Profiler (#27841 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-08 14:07:37 -08:00
Ev Lacey	77d702a22b	Enhance run_cluster.sh for multi-NIC support (#28328 ) Signed-off-by: Ev Lacey <elacey@nvidia.com>	2025-11-08 22:04:16 +00:00
zhangsicheng5	2108a571d7	[DCP] Support dcp kv_cache interleave size > 1 (#26696 ) Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: Qiu <qiuchunshuo@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-09 04:45:27 +09:00
Andy Lo	47604137a2	[Bugfix] Spec decode + structured output + spec model max len edge case (#28298 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-08 19:44:25 +00:00
Robert Shaw	26990d25dc	[Bugfix] Update device name for H200 detection (#28349 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-08 19:01:11 +00:00
Harry Mellor	d9ab1ad9d1	`reasoning_content` -> `reasoning` (#27752 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-08 12:15:08 +00:00
22quinn	608bb14462	[Attention] Remove max cudagraph size limit of 992 (#27840 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-07 22:33:27 -08:00
Xiaozhu Meng	4a36681f85	[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins (#27990 ) Signed-off-by: Xiaozhu <mxz297@gmail.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2025-11-07 22:25:21 -08:00
Abolfazl Shahbazi	d15afc1fd0	Refactor CPU/GPU extension targets for CMake build (#28026 ) Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>	2025-11-08 14:17:35 +08:00
Isotr0py	934a9c3b79	[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 (#28101 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 05:01:27 +00:00
gnovack	70af44fd10	[bugfix] support eagle with lora cudagraph specialization (#28318 ) Signed-off-by: gnovack <gnovack@amazon.com>	2025-11-08 03:25:45 +00:00
Aurick Qiao	781f5ebf52	Bump arctic-inference requirement (#28174 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-07 18:31:18 -08:00
Michael Goin	0852527647	[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM (#28124 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-07 18:20:55 -08:00
Hamid Mukhtar	61d25dc44b	Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) (#28308 ) Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com>	2025-11-08 02:09:21 +00:00
Xiaohong (Sean) Chen	d0c7792004	[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068 ) Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Haipeng Li <li2haipeng@gmail.com> Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>	2025-11-08 01:58:22 +00:00
Boyuan Feng	b158df2813	remove resolve_op_overloads and use splitting_ops directly (#28081 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-08 01:13:13 +00:00
Kunshang Ji	1aaecda078	[XPU] Enable Expert parallel for MoE models (#28263 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 00:33:11 +00:00
Harry Mellor	811df41ee9	Update Flashinfer from `v0.4.1` to `v0.5.2` (#27952 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-07 16:24:42 -08:00
Nick Hill	67a2da890e	[PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 22:11:03 +00:00
Nick Hill	da786e339e	[Core] Rework handling of async scheduling config (#28250 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 20:01:23 +00:00
Benjamin Chislett	18903216f5	[Bugfix] Fix and add tests for GptOss reasoning parser (#28000 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-07 19:28:04 +00:00

... 20 21 22 23 24 ...

12170 Commits