xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-25 18:24:33 +08:00

Author	SHA1	Message	Date
Zuyi Zhao	bca74e32b7	[Frontend] Add sagemaker_standards dynamic lora adapter and stateful session management decorators to vLLM OpenAI API server (#27892 ) Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com> Signed-off-by: Shen Teng <sheteng@amazon.com> Co-authored-by: Shen Teng <sheteng@amazon.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-11 04:57:01 +00:00
Zhuohan Li	8d706cca90	[Misc] FlattenLogprobs -> FlatLogprobs (#28335 )	2025-11-11 03:41:23 +00:00
Xin Yang	57201a6a4c	Fix rotary embedding benchmark script (#28323 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-11-10 21:57:12 -05:00
Michael Goin	f2d9ad0620	Only register rocm_aiter_ops if aiter is found (#28428 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-11 02:53:24 +00:00
Wentao Ye	de540c0354	[Feature] Add env var `VLLM_MOE_USE_DEEP_GEMM` (#28422 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-11 02:29:48 +00:00
Lucas Wilkinson	39029d5192	[CI/Test Fix] Fix CP tests on Blackwell (#28404 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-11 01:36:29 +00:00
Wentao Ye	35d801f13f	[Feature] Refactor batch invariant fp8 DeepGEMM (#27606 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-11 00:08:40 +00:00
Matthew Bonanni	0bf29fadf5	[Test] Remove old non-varlen FA2 test (#28420 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-10 23:57:41 +00:00
Adrian Abeyta	a5a790eea6	[Bugfix] Ensure calculated KV scales are applied in attention. (#27232 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-11-10 23:42:37 +00:00
Jialin Ouyang	b30372cbd0	[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-10 15:34:18 -08:00
Ilya Markov	d17ecc6b19	[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-10 18:33:11 -05:00
Yong Hoon Shin	021143561f	[ROCm] Add missing gemm_a8w8_blockscale import (#28378 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-10 23:13:36 +00:00
Robert Shaw	30700b1cd7	[CI] Fix Plugin Tests Tests (#28413 ) Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>	2025-11-10 22:36:11 +00:00
Andrew Xia	4b94ed8f92	[Frontend][2/n] remove empty content from _parse_tool_calls_from_content (#28331 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-11-10 14:07:49 -08:00
Lucas Wilkinson	6dec9f6109	[BugFix] Fix DeepGEMM over-allocating workspace (#28254 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-10 17:01:17 -05:00
Wei Wei	bf6a3d0ff5	[Misc] Add more scoping for improved trace (#28329 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-11-10 21:03:21 +00:00
Sage Moore	40d33264c6	[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Sage Moore <sagemoore@utexas.edu> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-10 20:39:19 +00:00
Jonas M. Kübler	9c84ca8293	[FA/Chore] Bump FA version for FP8 two-level accumulation (#27889 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-11-10 12:06:04 -08:00
Rémi Delacourt	6d54336ae5	[Bugfix] Fix llguidance backend, rollback when EOS was encountered (#25905 ) Signed-off-by: Rémi Delacourt <remi@mistral.ai> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-11-10 14:53:32 -05:00
jiahanc	34553b9d27	[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 12:34:57 -05:00
Varun Sundar Rabindranath	b039bfda8f	[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-10 09:21:52 -08:00
Cyrus Leung	d0e186c16f	[V0 Deprecation] Remove unused `context_len` and `seq_len` from M-RoPE (#28395 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-11 00:30:06 +08:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
caozuoba	40e2eeeb92	[Kernel] Optimization of the mm_k operator. (#28280 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-10 16:03:46 +00:00
zejunchen-zejun	b06b9470ca	[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model (#27474 ) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>	2025-11-10 10:38:56 -05:00
TJian	4673e465ff	Add @tjtanaa to codeowner for ROCm and multi-modal (#28360 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-10 21:39:17 +08:00
Ferrebo	912744d066	[Fix] optimize visual token mask with caching and multi-token support (#28374 ) Signed-off-by: Ferrebo <itachi971009@gmail.com> Signed-off-by: kebo01 <kebo01@baidu.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-10 13:23:49 +00:00
Yu Jiaqi	15be507c86	[bugfix] fix siglip batch text output error (#28365 ) Signed-off-by: piood <2477084691@qq.com>	2025-11-10 21:21:15 +08:00
Mark McLoughlin	6f7de33bed	[Metrics] Refactor LoRA state tracking (#26801 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 16:34:36 +08:00
Shinichi Hemmi	a98cc35c34	Restore PlaMo2 unit test as `pfnet/plamo-2-1b` now supports `transformers >=4.56` (#28019 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>	2025-11-10 06:50:02 +00:00
Lucas Wilkinson	e8697faf03	[V0 deprecation] Remove no longer used `get_metadata_cls` (#28370 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-10 14:32:09 +08:00
Xiake Sun	03fa4d3fb3	[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X (#28373 ) Signed-off-by: Xiake Sun <xiake.sun@amd.com> Signed-off-by: Xiake Sun <xisun@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-10 04:53:40 +00:00
Varun Sundar Rabindranath	6b2b9fd934	[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness (#28322 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-10 10:45:29 +08:00
JartX	c5f685b3ae	[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP (#28279 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2025-11-09 23:09:36 +00:00
Jiangyun Zhu	c4768dcf47	[Kernel] Fix fused_gdn_gating (#28343 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-09 14:26:35 -07:00
Zhewen Li	a65a934ebe	[CI/Build] Temporary fix to LM Eval Small Models (#28324 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-09 21:08:38 +00:00
usberkeley	4a8d6bd168	Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method (#28214 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-11-09 19:11:46 +00:00
Lucas Wilkinson	636efd10a5	[Core] Separate out attention metadata building logic from prepare inputs (#26764 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-09 13:51:43 -05:00
Nick Hill	289eb6c537	[Core] Simplify async KV output aggregation (#28327 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-09 09:44:13 -08:00
Nicolò Lucchesi	19d91ece4b	[CI] Fix flaky `test_eagle_correctness` test (#28364 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-09 16:04:59 +00:00
Jiangyun Zhu	7ae5a5fb11	[Misc] Add some comments in qwen3-next (#28267 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-08 23:59:24 -08:00
Yong Hoon Shin	de2b78305f	[ROCm] Add env to enable/disable aiter triton gemm (#28321 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-08 22:27:00 -08:00
Ning Xie	e5e9067e61	[Misc] fix typo and add detailed log (#28178 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-09 05:33:46 +00:00
yihong	3a7d580343	fix: close issue 28338 by fixed python version (#28339 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-11-09 05:07:26 +00:00
Kevin H. Luu	05f8d69077	[chore] Move some wikimedia images to S3 (#28351 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-11-09 01:58:26 +00:00
Mohammad Miadh Angkad	404d7a9d14	[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 (#28345 ) Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>	2025-11-08 15:50:10 -07:00
ElizaWszola	171133f929	[Bugfix] Fix test fused quant layernorm tests (#27865 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-08 14:31:33 -08:00
Cole Murray	32787d0644	Remove setuptools upper bound constraint (<80) (#28337 ) Signed-off-by: Cole Murray <colemurray.cs@gmail.com>	2025-11-08 22:30:18 +00:00
Benjamin Chislett	975676d174	[Feat] Drop-in Torch CUDA Profiler (#27841 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-08 14:07:37 -08:00
Ev Lacey	77d702a22b	Enhance run_cluster.sh for multi-NIC support (#28328 ) Signed-off-by: Ev Lacey <elacey@nvidia.com>	2025-11-08 22:04:16 +00:00

1 2 3 4 5 ...

11139 Commits