xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-26 13:07:16 +08:00

Author	SHA1	Message	Date
Tyler Michael Smith	8209f9057d	i honestly can't believe i spelled it that way Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-07-04 15:14:03 -04:00
Tyler Michael Smith	19c51c3439	merge main, add environment variable, factor into function Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-07-04 15:11:40 -04:00
Duncan Moss	3d184b95b8	[feat]: CUTLASS block scaled group gemm for SM100 (#19757 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Co-authored-by: Duncan Moss <dmoss@nvidia.com>	2025-07-04 12:58:04 -06:00
Thomas Parnell	2f35a022e6	Enable V1 for Hybrid SSM/Attention Models (#20016 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-07-04 17:46:53 +00:00
Chenheli Hua	ffe00ef77a	[Misc] Small: Remove global media connector. Each test should have its own test connector object. (#20395 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-07-04 08:15:03 -07:00
wang.yuqi	2e26f9156a	[Model][3/N] Automatic conversion of CrossEncoding model (#20168 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-04 05:47:39 -07:00
sangbumlikeagod	9e5452ee34	[Bug][Frontend] Fix structure of transcription's decoder_prompt (#18809 ) Signed-off-by: sangbumlikeagod <oironese@naver.com>	2025-07-04 11:28:07 +00:00
Michael Goin	0e3fe896e2	Support Llama 4 for fused_marlin_moe (#20457 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-04 07:55:10 +00:00
Jee Jee Li	1caca5a589	[Misc] Add SPDX-FileCopyrightText (#20428 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-04 07:40:42 +00:00
Aaron Pham	4a98edff1f	[Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-07-04 15:05:49 +08:00
Gabriel Marinho	a4113b035c	[Platform] Add custom default max tokens (#18557 ) Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>	2025-07-04 10:50:17 +08:00
Michael Goin	7e1665b089	[Misc] Change warn_for_unimplemented_methods to debug (#20455 )	2025-07-04 02:35:08 +00:00
Seiji Eicher	8d1096e7db	[Bugfix] Register reducer even if transformers_modules not available (#19510 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-07-03 22:08:12 +00:00
Nicolò Lucchesi	8d775dd30a	[Misc] Fix `Unable to detect current VLLM config. Defaulting to NHD kv cache layout` warning (#20400 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-03 14:56:09 -07:00
bnellnm	78fe77534b	[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-07-03 14:55:40 -07:00
Yuxuan Zhang	2f2fcb31b8	[Misc] Remove _maybe_ignore_quant_config from GLM4.1v (#20432 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>	2025-07-03 21:41:13 +00:00
Ning Xie	1dba2c4ebe	[Misc] adjust for ipv6 for mookcacke url parse (#20107 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-03 20:27:17 +00:00
Isotr0py	71d6de3a26	[Misc] Clean up InternVL family config registration (#19992 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-03 20:01:47 +00:00
Reid	619b9f5c7e	[Frontend] fix duplicate output for bench subcmd (#20446 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-03 08:02:06 -07:00
Reid	9854dc9040	[Frontend] improve vllm bench <bench_type> --help display (#20430 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-03 14:22:16 +00:00
wang.yuqi	6f1229f91d	[Model][2/N] Automatic conversion of CrossEncoding model (#19978 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-03 13:59:23 +00:00
Jee Jee Li	1819fbda63	[Quantization] Bump to use latest bitsandbytes (#20424 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-03 21:58:46 +08:00
Ning Xie	fb14d53cf6	[Kernel] refactor cpu worker v0 cache dtype (#20080 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-03 08:39:14 +00:00
Cyrus Leung	b024a42e93	[Core] Move multimodal placeholder from chat utils to model definition (#20355 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-03 08:18:30 +00:00
qscqesze	363528de27	[Feature] Support MiniMax-M1 function calls features (#20297 ) Signed-off-by: QscQ <qscqesze@gmail.com> Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-07-03 06:48:27 +00:00
Li, Jiang	0ec3779df7	[Bugfix][CI/CD][CPU] Fix CPU CI tests (#20383 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-02 20:11:36 -07:00
Chenheli Hua	b616f6a53d	[Misc] Small: Fix video loader return type annotations. (#20389 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-07-03 03:10:39 +00:00
bnellnm	2e25bb12a8	[Bugfix] Fix import of CutlassExpertsFp8 in compressed_tensors_moe.py (#20381 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-07-03 02:07:43 +00:00
Nick Hill	059d4cdb49	[BugFix] Fix DP headless mode arg validation (#20398 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-02 17:15:32 -07:00
Nick Hill	657f2f301a	[DP] Support external DP Load Balancer mode (#19790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-02 10:21:52 -07:00
vllmellm	a1aafc827a	[ROCm][FEAT] Enable Full Graph Mode in AITER MLA V1 Attn Backend (Decode Phase only) (#20254 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-07-02 16:25:46 +00:00
rongfu.leng	139508a418	[Misc] add handler HF_TOKEN is emptry string (#20369 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-07-02 09:14:31 -07:00
afeldman-nm	48fb076cbc	[V1] LogitsProcessor programming model (#16728 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-07-02 09:10:42 -07:00
bnellnm	c1909e7e8c	[Kernels] MoE refactor (#19636 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>	2025-07-02 06:08:27 -07:00
zichongli5	706ff13224	[Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct (#20286 ) Signed-off-by: Zichong Li <t-lizichong@microsoft.com@Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net> Co-authored-by: Zichong Li <t-lizichong@microsoft.com@Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-07-02 12:54:12 +00:00
WangHuaqiang	ccbfb1d1c9	[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322 ) Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>	2025-07-02 12:53:36 +00:00
CSWYF3634076	e303dcf523	[Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-07-02 03:37:01 -07:00
Cyrus Leung	ba51aea65e	[Bugfix] Keye-VL compatibility with `tok_kwargs` (#20058 ) (#20353 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-01 23:46:59 -07:00
Kwai-Keye	8452946c06	[Model][VLM] Support Keye-VL-8B-Preview (#20126 ) Signed-off-by: Kwai-Keye <Keye@kuaishou.com>	2025-07-01 23:35:04 -07:00
Chenheli Hua	2e7cbf2d7d	[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-07-01 23:34:03 -07:00
Chengji Yao	7da296be04	[TPU] kv cache update kernel supports dynamic grid (#20235 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-02 06:33:37 +00:00
Cyrus Leung	1a03dd496b	[Bugfix] Fix dynamic rotary embedding (#20343 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-02 06:31:26 +00:00
Kunshang Ji	27b8017636	[FIX][Intel GPU]fix ipex flash_attn_varlen_func api missing parameter (#20348 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-07-01 22:26:40 -07:00
Lifans	9ec1e3065a	[Misc][Doc] Add missing comment for LLM (#20285 ) Signed-off-by: Lifan Shen <lifans@meta.com>	2025-07-01 19:04:24 -07:00
Wentao Ye	9dae7d46bf	[Refactor] Remove Unused Env `VLLM_ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON` (#20334 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-01 19:03:43 -07:00
Wentao Ye	7058d7dd5d	[Refactor] Remove duplicate `find_free_port` (#20333 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-01 19:03:07 -07:00
Liangliang Ma	a0389e0554	[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169 ) Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>	2025-07-02 09:06:04 +08:00
czhu-cohere	3abfe22154	Enable group size 64 for Machete (#20290 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-07-01 18:05:44 -07:00
Wentao Ye	e81fbefe8a	[Refactor] Refactor import utils (#20269 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-01 18:05:42 -07:00
Woosuk Kwon	7f280d69c9	[Optimization] Cache sampled token ids in model runner (#20291 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-01 11:01:31 -07:00

1 2 3 4 5 ...

5048 Commits