xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-04 11:55:45 +08:00

Author	SHA1	Message	Date
Isotr0py	32c9be2200	[v1] Re-add fp32 support to v1 engine through FlexAttention (#19754 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-05 09:41:10 +00:00
Jee Jee Li	906e05d840	[Misc] Remove the unused LoRA test code (#20494 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-05 13:48:16 +08:00
Michael Goin	c108781c85	[CI Bugfix] Fix pre-commit failures on main (#20502 )	2025-07-04 14:17:30 -07:00
Duncan Moss	3d184b95b8	[feat]: CUTLASS block scaled group gemm for SM100 (#19757 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Co-authored-by: Duncan Moss <dmoss@nvidia.com>	2025-07-04 12:58:04 -06:00
Thomas Parnell	2f35a022e6	Enable V1 for Hybrid SSM/Attention Models (#20016 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-07-04 17:46:53 +00:00
wang.yuqi	2e26f9156a	[Model][3/N] Automatic conversion of CrossEncoding model (#20168 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-04 05:47:39 -07:00
sangbumlikeagod	9e5452ee34	[Bug][Frontend] Fix structure of transcription's decoder_prompt (#18809 ) Signed-off-by: sangbumlikeagod <oironese@naver.com>	2025-07-04 11:28:07 +00:00
Jee Jee Li	1caca5a589	[Misc] Add SPDX-FileCopyrightText (#20428 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-04 07:40:42 +00:00
Aaron Pham	4a98edff1f	[Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-07-04 15:05:49 +08:00
Seiji Eicher	8d1096e7db	[Bugfix] Register reducer even if transformers_modules not available (#19510 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-07-03 22:08:12 +00:00
bnellnm	78fe77534b	[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-07-03 14:55:40 -07:00
Ning Xie	1dba2c4ebe	[Misc] adjust for ipv6 for mookcacke url parse (#20107 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-03 20:27:17 +00:00
wang.yuqi	6f1229f91d	[Model][2/N] Automatic conversion of CrossEncoding model (#19978 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-03 13:59:23 +00:00
Cyrus Leung	b024a42e93	[Core] Move multimodal placeholder from chat utils to model definition (#20355 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-03 08:18:30 +00:00
Nick Hill	67d25eca05	[Tests] Update online DP tests to verify that requests are balanced (#20157 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-03 14:49:13 +08:00
qscqesze	363528de27	[Feature] Support MiniMax-M1 function calls features (#20297 ) Signed-off-by: QscQ <qscqesze@gmail.com> Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-07-03 06:48:27 +00:00
Chenheli Hua	b616f6a53d	[Misc] Small: Fix video loader return type annotations. (#20389 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-07-03 03:10:39 +00:00
Nick Hill	657f2f301a	[DP] Support external DP Load Balancer mode (#19790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-02 10:21:52 -07:00
Nick Hill	d265414dbc	[Minor] Clean up incorrect comment in test (#20382 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-02 09:13:37 -07:00
afeldman-nm	48fb076cbc	[V1] LogitsProcessor programming model (#16728 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-07-02 09:10:42 -07:00
bnellnm	c1909e7e8c	[Kernels] MoE refactor (#19636 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>	2025-07-02 06:08:27 -07:00
WangHuaqiang	ccbfb1d1c9	[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322 ) Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>	2025-07-02 12:53:36 +00:00
CSWYF3634076	e303dcf523	[Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-07-02 03:37:01 -07:00
Kwai-Keye	8452946c06	[Model][VLM] Support Keye-VL-8B-Preview (#20126 ) Signed-off-by: Kwai-Keye <Keye@kuaishou.com>	2025-07-01 23:35:04 -07:00
Chenheli Hua	2e7cbf2d7d	[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-07-01 23:34:03 -07:00
Chengji Yao	7da296be04	[TPU] kv cache update kernel supports dynamic grid (#20235 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-02 06:33:37 +00:00
Wentao Ye	7058d7dd5d	[Refactor] Remove duplicate `find_free_port` (#20333 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-01 19:03:07 -07:00
Liangliang Ma	a0389e0554	[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169 ) Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>	2025-07-02 09:06:04 +08:00
czhu-cohere	3abfe22154	Enable group size 64 for Machete (#20290 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-07-01 18:05:44 -07:00
Wentao Ye	e81fbefe8a	[Refactor] Refactor import utils (#20269 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-01 18:05:42 -07:00
Woosuk Kwon	7f280d69c9	[Optimization] Cache sampled token ids in model runner (#20291 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-01 11:01:31 -07:00
aiyiwang2025	ecad851cbd	[Model]Add Tencent HunYuanMoEV1 Model Support (#20114 ) Signed-off-by: aiyiwang <aiyiwang@tencent.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: quinnrong <quinnrong@tencent.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-01 07:28:13 -07:00
Yuxuan Zhang	ed70f3c64f	Add GLM4.1V model (Draft) (#19331 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-01 12:48:26 +00:00
Kyle Sayers	9025a9a705	[Quant] [Bugfix] Fix quantization config matching with `hf_to_vllm_mapper` (#20046 )	2025-07-01 19:20:34 +09:00
Lionel Villard	c05596f1a3	[Perf] Validate @config in pre-commit instead of dynamically (#20200 ) Signed-off-by: Lionel Villard <villard@us.ibm.com>	2025-07-01 05:10:28 -04:00
TY-AMD	96453cfa83	[BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine (#19067 ) Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>	2025-07-01 16:12:19 +08:00
Varun Sundar Rabindranath	08d81f1014	[Bugfix] Fix deepep tests (#20288 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-01 15:29:08 +08:00
Li, Jiang	6cc1e7d96d	[CPU] Update custom ops for the CPU backend (#20255 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-01 07:25:03 +00:00
czhu-cohere	9909726d2a	Enable ZP Support for Machete (#20268 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-07-01 07:12:20 +00:00
Alex Kogan	27949354fa	[Feature] A calibration-free RTN-based quantization for accurate and accelerated INT4/INT8 inference (#18768 ) Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-07-01 05:44:38 +00:00
fyuan1316	e28533a16f	[Bugfix] Fix include prompt in stream response when echo=true (#15233 ) Signed-off-by: Yuan Fang <yuanfang@alauda.io>	2025-07-01 01:30:14 +00:00
Luka Govedič	6d42ce8315	[CLI] Improve CLI arg parsing for `-O`/`--compilation-config` (#20156 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-07-01 01:03:13 +00:00
Kyle Sayers	d8cf819a9a	[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-06-30 17:26:49 +00:00
Wentao Ye	551ef1631a	[Unit Test] Add unit test for deep gemm (#20090 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-30 10:26:42 -06:00
Woosuk Kwon	2863befce3	[Optimization] Use Shared `CachedRequestData` Instance Across All Requests (#20232 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-30 09:07:50 -07:00
redmoe-moutain	65b1cbb138	[Model] support dots1 (#18254 ) Signed-off-by: redmoe-moutain <agiredmoe@gmail.com>	2025-06-29 19:34:36 -07:00
Dipika Sikka	6f2f53a82d	[Quantization] Add compressed-tensors NVFP4 MoE Support (#19990 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-06-29 22:05:40 +00:00
Michael Goin	7b1895e6ce	[CI Fix] Try fixing eagle e2e test OOM by reducing block allocation (#20213 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-29 10:31:37 +08:00
Wentao Ye	4d36693687	[Refactor] Create a function util and cache the results for `has_deepgemm`, `has_deepep`, `has_pplx` (#20187 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-28 22:06:38 +00:00
Stan Wozniak	daec9dea6e	[Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution (#20137 ) Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>	2025-06-28 08:16:41 -07:00

... 3 4 5 6 7 ...

2445 Commits