xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-05 10:37:10 +08:00

Author	SHA1	Message	Date
Harry Mellor	0faf3cc3e8	Move `SpeculativeConfig` from `config/__init__.py` to `config/speculative.py` (#24904 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-16 12:51:35 +01:00
Chen Bruce	7ea5c73ad7	[Feat][EPLB] A novel static EPLB placement strategy for MoE models. (#23745 ) Signed-off-by: bruceszchen <bruceszchen@tencent.com> Signed-off-by: Chen Bruce <bruceszchen@tencent.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Chen Bruce <cszwwdz@vip.qq.com> Co-authored-by: lemon412 <lemon412@foxmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-16 10:55:16 +00:00
tomeras91	27fcfe7bcf	[Mamba] Support TP>1 with quantization for mamba2 mixer in case `n_groups % tp_size == 0` (#24593 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-16 10:51:01 +00:00
Cheng Kuan Yong Jason	68dbde5dbb	[Bugfix] remove duplicate tokens streamed in required tool choice streaming (#23312 ) Signed-off-by: Jason Cheng <jasoncky96@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-09-16 15:16:32 +08:00
Jee Jee Li	04ad0dc275	[benchmark] Add triton version in the moe tuned config (#24769 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-16 14:10:54 +08:00
Saman A. Pour	238c4c1705	[QWEN NEXT] Fused MoE kernels Optimization configs (#24924 ) Signed-off-by: Saman Keon <samanamp@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-16 13:06:03 +08:00
vllmellm	8c54610265	[Bug] [Spec Dec]: Fix kv_cache dtype mismatch for Eagle3 drafter on FP8 target (#24505 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-09-16 04:45:38 +00:00
cascade	17871983a2	[Bugfix] Fix sequence parallelism bug when enable pipeline parallelism (#24021 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-09-16 04:32:32 +00:00
Woosuk Kwon	759ef49b15	Remove V0 Encoder-Decoder Support (#24907 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-15 21:17:14 -07:00
Kunshang Ji	5206ab20ba	[XPU] Fix circular import error. (#24927 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-16 03:35:36 +00:00
Lu Fang	0af3ce1355	Upgrade flashinfer to 0.3.1 (#24470 ) Signed-off-by: Lu Fang <lufang@fb.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-16 02:36:09 +00:00
Richard Zou	e1279ef00f	[Docs] Update instructions for how to using existing torch binary (#24892 ) Signed-off-by: Richard Zou <zou3519@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-16 02:25:50 +00:00
Mark McLoughlin	2942970d44	[Metrics] Hide deprecated metrics with gpu_ prefix (#24245 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-09-15 20:15:57 -06:00
Wentao Ye	3c96e7b8a1	[CI] Small Accuracy Eval Test for Deepseek Model (#24259 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-15 20:14:50 -06:00
Wentao Ye	b42566f440	[Bug] Fix `is_flashmla_supported` Check Error (#24774 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-15 20:10:55 -06:00
Reza Barazesh	d96e11167d	Add pytest-cov and .coveragerc (#24778 ) Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>	2025-09-15 20:08:46 -06:00
Gregory Shtrasberg	2891603efd	[ROCm][Bugfix] Fix the case where there's bias (#24895 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-09-15 20:05:12 -06:00
Wentao Ye	de2cc3d867	[Deprecation] Remove DeepGEMM Old Symbol Wrapper (#24902 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-15 20:03:29 -06:00
Michael Goin	e95084308b	Updated CODEOWNERS for flashinfer, mla, fused_moe (#24906 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-16 02:01:28 +00:00
Sergio Paniego Blanco	7f6f2c1182	`HuggingFace` -> `Hugging Face` in `Integration with Hugging Face` docs (#24889 )	2025-09-15 17:28:35 -07:00
Jiangyun Zhu	5bcc153d7b	[Compile] Fix noop_elimination pass and add tests for noop_elimination (#24880 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-15 23:33:18 +00:00
Mickaël Seznec	45bfa49cb8	[Tests] fix initialization of kv hash in tests (#24273 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai>	2025-09-15 21:48:27 +00:00
Simon Mo	fd2f10546c	[ci] fix wheel names for arm wheels (#24898 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-09-15 14:39:08 -07:00
Wentao Ye	e757a629e7	[Bug] Fix Cutlass Scaled MM Compilation Error (#24887 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-15 17:21:17 -04:00
Alexander Matveev	aae725af7c	[Performance] Remove redundant clone() calls in cutlass_mla (#24891 )	2025-09-15 20:21:53 +00:00
Andrew Xia	73df49ef3a	[gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still (#24759 ) Signed-off-by: Andrew Xia <axia@meta.com>	2025-09-15 13:08:08 -07:00
Andrew Xia	25aba2b6a3	[gpt-oss] Add IncompleteDetails to ResponsesRepsonse (#24561 ) Signed-off-by: Andrew Xia <axia@meta.com>	2025-09-15 13:07:55 -07:00
Benjamin Bartels	94b03f88dd	Bump Flashinfer to 0.3.1 (#24868 ) Signed-off-by: bbartels <benjamin@bartels.dev>	2025-09-15 12:45:55 -07:00
Sage Moore	49bfc538e4	Update num_tokens_across_dp to use nccl instead of gloo (#24105 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-09-15 19:05:48 +00:00
Kyle Sayers	a0b26701c9	[Transform] Deterministic Hadacore Transforms (#24106 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-09-15 12:59:31 -06:00
Harry Mellor	c4afdb69cc	Move `MultiModalConfig` from `config/__init__.py` to `config/multimodal.py` (#24659 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-15 17:43:16 +00:00
Rafael Marcelino Koike	b834b4cbf1	[USAGE] Improve error handling for weight initialization in Unquantized… (#20321 ) Signed-off-by: Rafael Marcelino Koike <rafael.koike@oracle.com> Signed-off-by: Rafael Koike <koike.rafael@gmail.com>	2025-09-15 16:45:49 +00:00
Harry Mellor	740f0647b1	Reinstate existing torch script (#24729 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-15 09:43:40 -07:00
xiao-llm	01413e0cf5	Fp8 paged attention update (#22222 ) Signed-off-by: Xiao Yu <xiao.yu@amd.com> Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com> Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com> Co-authored-by: Xiao Yu <xiao.yu@amd.com> Co-authored-by: Bowen Bao <bowenbao@amd.com>	2025-09-15 10:43:26 -04:00
Isotr0py	0e219cd50b	[Bugfix] Fix GLM4.1V multimodal processor with compatability for Transformers v4.56 (#24822 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-15 20:45:06 +08:00
ant-yy	72c99f2a75	[Model]: support Ling2.0 (#24627 ) Signed-off-by: vito.yy <vito.yy@antgroup.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-15 05:09:30 -07:00
wang.yuqi	bf214ca226	[Misc] Fix examples openai_pooling_client.py (#24853 ) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-15 11:57:30 +00:00
Nicolò Lucchesi	2e41f5abca	[XPU] Set consistent default KV cache layout (#24745 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-15 18:09:34 +08:00
Ning Xie	bc0f6059a2	[UT] enhance free kv cache block queue popleft_n (#24220 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-09-15 10:04:37 +00:00
Chao Lei	8de261b04a	[P/D]`kv_output_aggregator` support P TP > D TP (#23917 ) Signed-off-by: LCAIZJ <leichao139636@163.com> Co-authored-by: leichao.lc <leichao.lc@antgroup.com>	2025-09-15 11:36:06 +02:00
Nicolò Lucchesi	a0d8b9738d	[Misc] Own KVConnectors installation (#24867 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-15 02:21:09 -07:00
Ning Xie	59e17dd4a0	[Misc] rename interval to max_recent_requests (#24229 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-09-15 09:18:42 +00:00
Didier Durand	4979eb79da	[Doc]: fix typos in various files (#24821 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-15 01:08:52 -07:00
bingchen-mi	a8c0f59973	[Bugfix] MiDashengLM model contact error under concurrent testing (#24738 ) Signed-off-by: chenbing8 <chenbing8@xiaomi.com> Signed-off-by: bingchen-mi <chenbing8@xiaomi.com>	2025-09-15 06:38:12 +00:00
Ce Gao	f4a948f33f	[Frontend] Skip `stop` in reasoning content (#14550 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-09-15 06:04:55 +00:00
Ning Xie	3f3313981c	[kv cache] update num_free_blocks in the end (#24228 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-09-15 05:15:12 +00:00
Michael Yao	78818dd1b0	[Docs] Have a try to improve frameworks/streamlit.md (#24841 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-09-14 21:50:36 -07:00
Chen Zhang	8e5cdcda4e	[Hybrid Allocator] Support Pipeline Parallel (#23974 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-14 15:55:17 -07:00
wuhang	90f3f7d73e	[Spec Decoding]Support Spec Decoding Metrics in DP Mode (#24049 ) Signed-off-by: wuhang <wuhang6@huawei.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-14 21:11:09 +00:00
Robert Shaw	6dc8da5dc1	[Chore] Remove ipex_ops warning (#24835 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-14 19:41:53 +00:00

1 2 3 4 5 ...

9501 Commits