qscqesze
|
5e9455ae8f
|
[Bugfix]: Fix the streaming output for function calls in the minimax (#22015)
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
|
2025-08-06 20:30:27 -07:00 |
|
Michael Goin
|
a00d8b236f
|
Use float32 for test_completion.py (#22385)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-07 11:07:47 +08:00 |
|
Cyrus Leung
|
04cf435d95
|
[Bugfix] Fix wrong method name in Intern-S1 image processor (#22417)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-06 20:05:20 -07:00 |
|
Tao He
|
7377131a2c
|
[Qwen3] Enable dual-chunk-attention support for Qwen3 models. (#21924)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
|
2025-08-06 19:58:08 -07:00 |
|
Kunshang Ji
|
6b47ef24de
|
[XPU]Fix flash_attn_varlen_func interface on xpu (#22350)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-08-06 19:28:11 -07:00 |
|
Lucas Wilkinson
|
1dc8a70b6d
|
[Attention] Support multiple attention metadata builders per kv_cache_spec + proper local attention no hybrid kv cache fix (#21588)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-06 18:40:52 -07:00 |
|
Maximilien de Bayser
|
f825c6bd22
|
Support encoder_only attention for FlexAttention (#22273)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-06 18:37:14 -07:00 |
|
tc-mb
|
41b67f4263
|
[model] Support MiniCPM-V 4.0 (#22166)
Co-authored-by: imning3 <hbning@pku.edu.cn>
|
2025-08-06 18:35:46 -07:00 |
|
Michael Goin
|
e8961e963a
|
Update flashinfer-python==0.2.10 (#22389)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-06 18:10:24 -07:00 |
|
Lain
|
9a3835aaa9
|
Fix trtllm-gen attention env and add attention sink (#22378)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Lain <fusiyuan2000@hotmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-06 18:07:41 -07:00 |
|
Yongye Zhu
|
5c7cc33f4d
|
[gpt-oss] fix model config with hf_config (#22401)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-06 18:04:04 -07:00 |
|
Chen Zhang
|
19c9365aa4
|
[gpt-oss] add demo tool server (#22393)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-06 17:47:14 -07:00 |
|
Wentao Ye
|
eec890c1c1
|
[Bug] Fix B200 DeepGEMM E8M0 Accuracy Issue (#22399)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-06 17:03:53 -07:00 |
|
Asaf Joseph Gardin
|
46a13949d5
|
[v1] - Mamba1 Attention Metadata (#21249)
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
|
2025-08-06 17:03:42 -07:00 |
|
Yongye Zhu
|
31f09c615f
|
[gpt-oss] flashinfer mxfp4 (#22339)
Signed-off-by: simon-mo <xmo@berkeley.edu>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-08-06 12:37:27 -07:00 |
|
Yongye Zhu
|
31f5dc5b2a
|
[gpt-oss] Enhance error msg on attention sink init (#22335)
Signed-off-by: simon-mo <xmo@berkeley.edu>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-08-06 11:41:42 -07:00 |
|
Woosuk Kwon
|
ec7cb19224
|
[gpt-oss] Add loop for built-in tool call (#22374)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-06 10:32:21 -07:00 |
|
Gregory Shtrasberg
|
2435ea7ed5
|
[Bugfix] Make condition in triton kernel constexpr (#22370)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-08-06 10:00:58 -07:00 |
|
Lucas Wilkinson
|
4a6b72c2ab
|
[BugFix] Fix triton compile error in kernel_unified_attention_2/3d caused by attention sinks (#22368)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
|
2025-08-06 09:47:38 -07:00 |
|
Zhang Jason
|
b4b9813b5e
|
add the codes to check AMD Instinct GPU number (#22367)
Signed-off-by: Zhang Jason <ning.zhang2@amd.com>
|
2025-08-06 08:58:38 -07:00 |
|
Lucas Wilkinson
|
2cb6ef8996
|
[BugFix] Fix FA2 RuntimeError when sinks is provided (#22365)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
|
2025-08-06 08:03:03 -07:00 |
|
Woosuk Kwon
|
9edd1db02b
|
[Minor] Fix type (#22347)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-06 02:22:03 -07:00 |
|
Woosuk Kwon
|
f263a4b53f
|
[gpt-oss] Support chat completion api (#22342)
|
2025-08-06 01:57:39 -07:00 |
|
Roger Wang
|
54991c548a
|
[gpt-oss] add model to supported models doc (#22336)
Signed-off-by: Roger Wang <hey@rogerw.me>
|
2025-08-06 01:49:44 -07:00 |
|
Woosuk Kwon
|
178d03fbd6
|
[gpt-oss] Add Tool/ConversationContext classes and harmony_utils (#22340)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-06 01:08:49 -07:00 |
|
Isotr0py
|
fa00c5d75b
|
[Misc] Clean up duplicated hf overrides (#22311)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-06 07:50:25 +00:00 |
|
Woosuk Kwon
|
134a8ee8fd
|
[gpt-oss] Add openai-harmony as default dependency (#22332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-06 00:10:14 -07:00 |
|
Yongye Zhu
|
90ec006937
|
[gpt-oss] flashinfer attention sink init (#22330)
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
|
2025-08-05 23:48:19 -07:00 |
|
Chen Zhang
|
a47e6ffe93
|
[GptOss] Add GptOss reasoning parser to support structure output (#22322)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-05 23:39:13 -07:00 |
|
Woosuk Kwon
|
98a3a81024
|
[ROCm] Add attention sink to use_rocm_custom_paged_attention (#22329)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-05 23:30:38 -07:00 |
|
Woosuk Kwon
|
de98252f49
|
Add GPT-OSS model code and config [1/N] (#22327)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-05 23:26:00 -07:00 |
|
Harry Mellor
|
796bae07c5
|
Update transformers to v4.55 (#21931)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-05 22:56:14 -07:00 |
|
Woosuk Kwon
|
6e20924350
|
Add attention sink in attention backends (#22320)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-05 22:37:21 -07:00 |
|
Woosuk Kwon
|
dd16bdc798
|
Increase openai-python version (#22316)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-05 21:43:21 -07:00 |
|
Woosuk Kwon
|
e3c876dca3
|
Upgrade FA3 for attention sink (#22313)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-05 21:36:21 -07:00 |
|
Gregory Shtrasberg
|
5d5d419ca6
|
[Bugfix][CI/Build][ROCm] Make sure to use the headers from the build folder on ROCm (#22264)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-08-05 20:39:32 -07:00 |
|
Rui Qiao
|
302962e806
|
[Bugfix] Skip dead and non-GPU nodes for Ray DP engine allocation (#22275)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-08-05 20:35:32 -07:00 |
|
Benjamin Chislett
|
7e6544c797
|
[Perf] Parallelize fill_bitmask to accelerate high-throughput guided decoding (#21862)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-08-05 19:57:49 -07:00 |
|
Jee Jee Li
|
8e6c7e873f
|
[Bugfix] Fix MoE BNB version (#22260)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-05 19:56:22 -07:00 |
|
Michael Goin
|
6a51530437
|
[Bugfix] Fix 3D input passed into cutlass_scaled_mm (#22278)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-06 10:35:20 +08:00 |
|
Michael Goin
|
35509fc5be
|
[Bugfix] Remove faulty test for oot attention backend (#22286)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-06 00:05:40 +00:00 |
|
Siyuan Liu
|
4b29d2784b
|
[CI][TPU] Fix docker clean up (#22271)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-08-05 23:54:56 +00:00 |
|
youkaichao
|
59a0b8554b
|
[bugfix] fix blackwell deepep installation (#22255)
|
2025-08-06 01:26:09 +08:00 |
|
Giancarlo Delfin
|
469b3ffaaa
|
[V1] port xformers backend to v1 (#21342)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-05 10:04:46 -07:00 |
|
Wentao Ye
|
ae87ddd040
|
[Refactor] Remove Unused Environment Variable VLLM_NO_DEPRECATION_WARNING (#22199)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-05 09:40:23 -07:00 |
|
Michael Goin
|
a7cb6101ca
|
[CI/Build] Update flashinfer to 0.2.9 (#22233)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-05 09:39:38 -07:00 |
|
Michael Goin
|
c494f96fbc
|
Use UV_LINK_MODE=copy in Dockerfile to avoid hardlink fail (#22128)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-05 06:57:10 -07:00 |
|
Nicolò Lucchesi
|
0c275ad5ad
|
[V0 Deprecation][TPU] Remove V1 flag check from tests (#22248)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-05 06:53:23 -07:00 |
|
Ning Xie
|
74333ae2f6
|
[Misc] correct static type check for GroupCoordinator (#21946)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-05 03:17:46 -07:00 |
|
elvischenv
|
83156c7b89
|
[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-08-05 02:45:34 -07:00 |
|