WangHuaqiang
|
ccbfb1d1c9
|
[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322)
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
|
2025-07-02 12:53:36 +00:00 |
|
Joonchen Liau
|
9e5552aa13
|
[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280)
Signed-off-by: kaln27 <liaojuncheng123@foxmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-02 06:47:19 -06:00 |
|
Lu Fang
|
0c600b9ab6
|
[Build/CI] Automatically tag DeepSeek related PRs (#20370)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-07-02 04:02:43 -07:00 |
|
CSWYF3634076
|
e303dcf523
|
[Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-07-02 03:37:01 -07:00 |
|
Michael Yao
|
ae9c4d416f
|
[Docs] Make TPU ref prettier in google_tpu.md (#20356)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-07-02 02:04:08 -07:00 |
|
Michael Yao
|
d853520b3e
|
[Docs] Fix indentations for 2-level items in deprecation_policy.md (#20352)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-07-01 23:50:31 -07:00 |
|
Cyrus Leung
|
ba51aea65e
|
[Bugfix] Keye-VL compatibility with tok_kwargs (#20058) (#20353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-01 23:46:59 -07:00 |
|
Kwai-Keye
|
8452946c06
|
[Model][VLM] Support Keye-VL-8B-Preview (#20126)
Signed-off-by: Kwai-Keye <Keye@kuaishou.com>
|
2025-07-01 23:35:04 -07:00 |
|
Chenheli Hua
|
2e7cbf2d7d
|
[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-07-01 23:34:03 -07:00 |
|
Chengji Yao
|
7da296be04
|
[TPU] kv cache update kernel supports dynamic grid (#20235)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-02 06:33:37 +00:00 |
|
QiliangCui
|
b205e8467d
|
[Doc][TPU] Add models and features supporting matrix. (#20230)
Signed-off-by: Qiliang Cui <cuiq@google.com>
|
2025-07-02 06:33:20 +00:00 |
|
yyzxw
|
be0cfb2b68
|
fix[Docs]: link anchor is incorrect #20309 (#20315)
Signed-off-by: zxw <1020938856@qq.com>
|
2025-07-02 06:32:34 +00:00 |
|
Cyrus Leung
|
1a03dd496b
|
[Bugfix] Fix dynamic rotary embedding (#20343)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-02 06:31:26 +00:00 |
|
Kunshang Ji
|
27b8017636
|
[FIX][Intel GPU]fix ipex flash_attn_varlen_func api missing parameter (#20348)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-07-01 22:26:40 -07:00 |
|
Lifans
|
9ec1e3065a
|
[Misc][Doc] Add missing comment for LLM (#20285)
Signed-off-by: Lifan Shen <lifans@meta.com>
|
2025-07-01 19:04:24 -07:00 |
|
Wentao Ye
|
9dae7d46bf
|
[Refactor] Remove Unused Env VLLM_ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON (#20334)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-01 19:03:43 -07:00 |
|
Wentao Ye
|
7058d7dd5d
|
[Refactor] Remove duplicate find_free_port (#20333)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-01 19:03:07 -07:00 |
|
Liangliang Ma
|
a0389e0554
|
[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
2025-07-02 09:06:04 +08:00 |
|
Tyler Michael Smith
|
3be8d312a2
|
[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-01 18:05:47 -07:00 |
|
czhu-cohere
|
3abfe22154
|
Enable group size 64 for Machete (#20290)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-07-01 18:05:44 -07:00 |
|
Wentao Ye
|
e81fbefe8a
|
[Refactor] Refactor import utils (#20269)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-01 18:05:42 -07:00 |
|
周周周
|
9290de5667
|
remove unused variables in marlin_template.h (#20236)
|
2025-07-02 00:51:52 +00:00 |
|
Woosuk Kwon
|
7f280d69c9
|
[Optimization] Cache sampled token ids in model runner (#20291)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-01 11:01:31 -07:00 |
|
TJian
|
02cabff207
|
[V1] [ROCm] Enable EP with AITER Fused MoE (#20270)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-07-01 16:48:30 +00:00 |
|
Shintarou Okada
|
3d19d47d91
|
[Frontend] Expand tools even if tool_choice="none" (#17177)
Signed-off-by: okada shintarou <okada@preferred.jp>
|
2025-07-01 12:47:38 -04:00 |
|
Woosuk Kwon
|
8acb4badee
|
[CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling (#20301)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-01 09:07:36 -07:00 |
|
Nicolò Lucchesi
|
314af8617c
|
[Docs] Update transcriptions API to use openai client with stream=True (#20271)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-01 15:47:13 +00:00 |
|
Woosuk Kwon
|
0e96cc9b7e
|
[Misc] Minor refactoring for scheduler (#20299)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-01 07:55:32 -07:00 |
|
aiyiwang2025
|
ecad851cbd
|
[Model]Add Tencent HunYuanMoEV1 Model Support (#20114)
Signed-off-by: aiyiwang <aiyiwang@tencent.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: quinnrong <quinnrong@tencent.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-01 07:28:13 -07:00 |
|
Yuxuan Zhang
|
ed70f3c64f
|
Add GLM4.1V model (Draft) (#19331)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-01 12:48:26 +00:00 |
|
Nicolò Lucchesi
|
650d5dbd04
|
[Misc] Minor refactor of NIXL background handshake (#20068)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-01 12:40:14 +01:00 |
|
Kyle Sayers
|
9025a9a705
|
[Quant] [Bugfix] Fix quantization config matching with hf_to_vllm_mapper (#20046)
|
2025-07-01 19:20:34 +09:00 |
|
Lionel Villard
|
c05596f1a3
|
[Perf] Validate @config in pre-commit instead of dynamically (#20200)
Signed-off-by: Lionel Villard <villard@us.ibm.com>
|
2025-07-01 05:10:28 -04:00 |
|
Reid
|
787b13389e
|
[doc] fix the incorrect logo in dark mode (#20289)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-01 08:18:09 +00:00 |
|
TY-AMD
|
96453cfa83
|
[BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine (#19067)
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
|
2025-07-01 16:12:19 +08:00 |
|
Kebe
|
b1c1fe35a5
|
[Misc] remove redundant char (#20287)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-07-01 15:33:22 +08:00 |
|
Varun Sundar Rabindranath
|
08d81f1014
|
[Bugfix] Fix deepep tests (#20288)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-01 15:29:08 +08:00 |
|
Li, Jiang
|
6cc1e7d96d
|
[CPU] Update custom ops for the CPU backend (#20255)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-01 07:25:03 +00:00 |
|
czhu-cohere
|
9909726d2a
|
Enable ZP Support for Machete (#20268)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-07-01 07:12:20 +00:00 |
|
Prashant Gupta
|
22e9d42040
|
[Misc] add xgrammar for arm64 (#18359)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
|
2025-07-01 07:02:20 +00:00 |
|
Richard Barnes
|
86debab54c
|
Fix numel() downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 (#17082)
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-01 06:48:10 +00:00 |
|
Michael Goin
|
be250bbc67
|
[V1] Only print cudagraph tqdm on rank 0 with is_global_first_rank (#19516)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-01 06:02:09 +00:00 |
|
Alex Kogan
|
27949354fa
|
[Feature] A calibration-free RTN-based quantization for accurate and accelerated INT4/INT8 inference (#18768)
Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-07-01 05:44:38 +00:00 |
|
Ernest Wong
|
bd5038af07
|
[Doc] add config and troubleshooting guide for NCCL & GPUDirect RDMA (#15897)
Signed-off-by: Ernest Wong <chwong719@gmail.com>
|
2025-06-30 21:44:39 -07:00 |
|
Chendi.Xue
|
a2f14dc8f9
|
[CI][Intel Gaudi][vllm-Plugin]Add CI for hpu-plugin-v1-test (#20196)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-07-01 04:17:07 +00:00 |
|
Kuntai Du
|
92ee7baaf9
|
[Example] add one-click runnable example for P2P NCCL XpYd (#20246)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-06-30 21:03:55 -07:00 |
|
Woosuk Kwon
|
7151f92241
|
[Misc] Fix spec decode example (#20296)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 21:01:48 -07:00 |
|
fyuan1316
|
e28533a16f
|
[Bugfix] Fix include prompt in stream response when echo=true (#15233)
Signed-off-by: Yuan Fang <yuanfang@alauda.io>
|
2025-07-01 01:30:14 +00:00 |
|
Luka Govedič
|
6d42ce8315
|
[CLI] Improve CLI arg parsing for -O/--compilation-config (#20156)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-07-01 01:03:13 +00:00 |
|
Zhonghua Deng
|
ded1fb635b
|
[Bugfix][V1][P/D]Fix the issue of occasional garbled output for P2pNcclConnector (#20263)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-06-30 16:45:14 -07:00 |
|