9485 Commits

Author SHA1 Message Date
Gregory Shtrasberg
2891603efd
[ROCm][Bugfix] Fix the case where there's bias (#24895)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-15 20:05:12 -06:00
Wentao Ye
de2cc3d867
[Deprecation] Remove DeepGEMM Old Symbol Wrapper (#24902)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-15 20:03:29 -06:00
Michael Goin
e95084308b
Updated CODEOWNERS for flashinfer, mla, fused_moe (#24906)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-16 02:01:28 +00:00
Sergio Paniego Blanco
7f6f2c1182
HuggingFace -> Hugging Face in Integration with Hugging Face docs (#24889) 2025-09-15 17:28:35 -07:00
Jiangyun Zhu
5bcc153d7b
[Compile] Fix noop_elimination pass and add tests for noop_elimination (#24880)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-15 23:33:18 +00:00
Mickaël Seznec
45bfa49cb8
[Tests] fix initialization of kv hash in tests (#24273)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
2025-09-15 21:48:27 +00:00
Simon Mo
fd2f10546c
[ci] fix wheel names for arm wheels (#24898)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-09-15 14:39:08 -07:00
Wentao Ye
e757a629e7
[Bug] Fix Cutlass Scaled MM Compilation Error (#24887)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-15 17:21:17 -04:00
Alexander Matveev
aae725af7c
[Performance] Remove redundant clone() calls in cutlass_mla (#24891) 2025-09-15 20:21:53 +00:00
Andrew Xia
73df49ef3a
[gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still (#24759)
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-15 13:08:08 -07:00
Andrew Xia
25aba2b6a3
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse (#24561)
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-15 13:07:55 -07:00
Benjamin Bartels
94b03f88dd
Bump Flashinfer to 0.3.1 (#24868)
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-09-15 12:45:55 -07:00
Sage Moore
49bfc538e4
Update num_tokens_across_dp to use nccl instead of gloo (#24105)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-09-15 19:05:48 +00:00
Kyle Sayers
a0b26701c9
[Transform] Deterministic Hadacore Transforms (#24106)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-09-15 12:59:31 -06:00
Harry Mellor
c4afdb69cc
Move MultiModalConfig from config/__init__.py to config/multimodal.py (#24659)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-15 17:43:16 +00:00
Rafael Marcelino Koike
b834b4cbf1
[USAGE] Improve error handling for weight initialization in Unquantized… (#20321)
Signed-off-by: Rafael Marcelino Koike <rafael.koike@oracle.com>
Signed-off-by: Rafael Koike <koike.rafael@gmail.com>
2025-09-15 16:45:49 +00:00
Harry Mellor
740f0647b1
Reinstate existing torch script (#24729)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-15 09:43:40 -07:00
xiao-llm
01413e0cf5
Fp8 paged attention update (#22222)
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com>
Co-authored-by: Xiao Yu <xiao.yu@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
2025-09-15 10:43:26 -04:00
Isotr0py
0e219cd50b
[Bugfix] Fix GLM4.1V multimodal processor with compatability for Transformers v4.56 (#24822)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-15 20:45:06 +08:00
ant-yy
72c99f2a75
[Model]: support Ling2.0 (#24627)
Signed-off-by: vito.yy <vito.yy@antgroup.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-15 05:09:30 -07:00
wang.yuqi
bf214ca226
[Misc] Fix examples openai_pooling_client.py (#24853)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-15 11:57:30 +00:00
Nicolò Lucchesi
2e41f5abca
[XPU] Set consistent default KV cache layout (#24745)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-15 18:09:34 +08:00
Ning Xie
bc0f6059a2
[UT] enhance free kv cache block queue popleft_n (#24220)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-09-15 10:04:37 +00:00
Chao Lei
8de261b04a
[P/D]kv_output_aggregator support P TP > D TP (#23917)
Signed-off-by: LCAIZJ <leichao139636@163.com>
Co-authored-by: leichao.lc <leichao.lc@antgroup.com>
2025-09-15 11:36:06 +02:00
Nicolò Lucchesi
a0d8b9738d
[Misc] Own KVConnectors installation (#24867)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-15 02:21:09 -07:00
Ning Xie
59e17dd4a0
[Misc] rename interval to max_recent_requests (#24229)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-09-15 09:18:42 +00:00
Didier Durand
4979eb79da
[Doc]: fix typos in various files (#24821)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-15 01:08:52 -07:00
bingchen-mi
a8c0f59973
[Bugfix] MiDashengLM model contact error under concurrent testing (#24738)
Signed-off-by: chenbing8 <chenbing8@xiaomi.com>
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com>
2025-09-15 06:38:12 +00:00
Ce Gao
f4a948f33f
[Frontend] Skip stop in reasoning content (#14550)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-09-15 06:04:55 +00:00
Ning Xie
3f3313981c
[kv cache] update num_free_blocks in the end (#24228)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-09-15 05:15:12 +00:00
Michael Yao
78818dd1b0
[Docs] Have a try to improve frameworks/streamlit.md (#24841)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-14 21:50:36 -07:00
Chen Zhang
8e5cdcda4e
[Hybrid Allocator] Support Pipeline Parallel (#23974)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-14 15:55:17 -07:00
wuhang
90f3f7d73e
[Spec Decoding]Support Spec Decoding Metrics in DP Mode (#24049)
Signed-off-by: wuhang <wuhang6@huawei.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-14 21:11:09 +00:00
Robert Shaw
6dc8da5dc1
[Chore] Remove ipex_ops warning (#24835)
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-14 19:41:53 +00:00
FengjinChen
79cbcab871
Force use C++17 globally to avoid compilation error (#24823)
Signed-off-by: chenfengjin <1871653365@qq.com>
2025-09-14 19:30:10 +00:00
Ye (Charlotte) Qi
ff68035932
[Benchmarks] Throw usage error when using dataset-name random and dataset-path together (#24819)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-14 17:50:01 +00:00
co63oc
1177dd53e9
fix type of sampling rate for encode_base64 (#24826)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-14 16:17:16 +00:00
Wentao Ye
fc2dbcda8b
[Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (#24783)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-14 11:20:17 -04:00
Hyogeun Oh (오효근)
fec347dee1
[Misc] Improve s3_utils type hints with BaseClient (#24825)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-14 12:11:14 +00:00
Wenlong Wang
cc3173ae98
[Multi Modal][Performance] Fused Q,K's apply_rope into one (#24511)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-14 08:10:21 +00:00
Woosuk Kwon
3e903b6cb4
[Chore] Minor simplification for non-PP path (#24810)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-13 17:41:36 -07:00
Victor Ziliang Peng
973c9d01da
[Minor] Simplify duplicative device check for cuda (#24793)
Signed-off-by: Ziliang Peng <ziliangdotme@gmail.com>
2025-09-13 18:28:38 +00:00
TaoYu Chen
15b8fef453
Remove redundant assignment in xfer_buffers, This is a little fix (#24732)
Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com>
2025-09-13 08:11:59 +00:00
Wenlong Wang
cfa3234a5b
[CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (#24771)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-13 15:45:11 +08:00
Didier Durand
41ae4a1eab
[Doc]: fix typos in various files (#24798)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-13 00:43:33 -07:00
Russell Bryant
4dad72f0d9
[Misc] Correct an outdated comment. (#24765)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-13 00:34:53 -07:00
Michael Goin
59d7ffc17f
[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (#24750)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-13 07:29:19 +00:00
Lukas Geiger
1da0f1441d
[Core][Multimodal] Cache supports_kw (#24773)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-13 07:27:04 +00:00
Elvir Crnčević
98229db244
[Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054)
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-13 00:17:27 -07:00
elvischenv
dbeee3844c
[Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (#24757)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-09-13 00:16:24 -07:00