Chao Lei
8de261b04a
[P/D]kv_output_aggregator support P TP > D TP ( #23917 )
...
Signed-off-by: LCAIZJ <leichao139636@163.com>
Co-authored-by: leichao.lc <leichao.lc@antgroup.com>
2025-09-15 11:36:06 +02:00
Nicolò Lucchesi
a0d8b9738d
[Misc] Own KVConnectors installation ( #24867 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-15 02:21:09 -07:00
Ning Xie
59e17dd4a0
[Misc] rename interval to max_recent_requests ( #24229 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-09-15 09:18:42 +00:00
Didier Durand
4979eb79da
[Doc]: fix typos in various files ( #24821 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-15 01:08:52 -07:00
bingchen-mi
a8c0f59973
[Bugfix] MiDashengLM model contact error under concurrent testing ( #24738 )
...
Signed-off-by: chenbing8 <chenbing8@xiaomi.com>
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com>
2025-09-15 06:38:12 +00:00
Ce Gao
f4a948f33f
[Frontend] Skip stop in reasoning content ( #14550 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-09-15 06:04:55 +00:00
Ning Xie
3f3313981c
[kv cache] update num_free_blocks in the end ( #24228 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-09-15 05:15:12 +00:00
Michael Yao
78818dd1b0
[Docs] Have a try to improve frameworks/streamlit.md ( #24841 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-14 21:50:36 -07:00
Chen Zhang
8e5cdcda4e
[Hybrid Allocator] Support Pipeline Parallel ( #23974 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-14 15:55:17 -07:00
wuhang
90f3f7d73e
[Spec Decoding]Support Spec Decoding Metrics in DP Mode ( #24049 )
...
Signed-off-by: wuhang <wuhang6@huawei.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-14 21:11:09 +00:00
Robert Shaw
6dc8da5dc1
[Chore] Remove ipex_ops warning ( #24835 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-14 19:41:53 +00:00
FengjinChen
79cbcab871
Force use C++17 globally to avoid compilation error ( #24823 )
...
Signed-off-by: chenfengjin <1871653365@qq.com>
2025-09-14 19:30:10 +00:00
Ye (Charlotte) Qi
ff68035932
[Benchmarks] Throw usage error when using dataset-name random and dataset-path together ( #24819 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-14 17:50:01 +00:00
co63oc
1177dd53e9
fix type of sampling rate for encode_base64 ( #24826 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-14 16:17:16 +00:00
Wentao Ye
fc2dbcda8b
[Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement ( #24783 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-14 11:20:17 -04:00
Hyogeun Oh (오효근)
fec347dee1
[Misc] Improve s3_utils type hints with BaseClient ( #24825 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-14 12:11:14 +00:00
Wenlong Wang
cc3173ae98
[Multi Modal][Performance] Fused Q,K's apply_rope into one ( #24511 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-14 08:10:21 +00:00
Woosuk Kwon
3e903b6cb4
[Chore] Minor simplification for non-PP path ( #24810 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-13 17:41:36 -07:00
Victor Ziliang Peng
973c9d01da
[Minor] Simplify duplicative device check for cuda ( #24793 )
...
Signed-off-by: Ziliang Peng <ziliangdotme@gmail.com>
2025-09-13 18:28:38 +00:00
TaoYu Chen
15b8fef453
Remove redundant assignment in xfer_buffers, This is a little fix ( #24732 )
...
Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com>
2025-09-13 08:11:59 +00:00
Wenlong Wang
cfa3234a5b
[CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again ( #24771 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-13 15:45:11 +08:00
Didier Durand
41ae4a1eab
[Doc]: fix typos in various files ( #24798 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-13 00:43:33 -07:00
Russell Bryant
4dad72f0d9
[Misc] Correct an outdated comment. ( #24765 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-13 00:34:53 -07:00
Michael Goin
59d7ffc17f
[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe ( #24750 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-13 07:29:19 +00:00
Lukas Geiger
1da0f1441d
[Core][Multimodal] Cache supports_kw ( #24773 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-13 07:27:04 +00:00
Elvir Crnčević
98229db244
[Kernels][DP/EP] Optimize Silu Kernel for R1 ( #24054 )
...
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-13 00:17:27 -07:00
elvischenv
dbeee3844c
[Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization ( #24757 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-09-13 00:16:24 -07:00
Rakesh Asapanna
30498f2a65
[Doc]: Remove 404 hyperlinks ( #24785 )
...
Signed-off-by: Rakesh Asapanna <45640029+rozeappletree@users.noreply.github.com>
2025-09-13 00:15:41 -07:00
Harry Mellor
abc7989adc
[Docs] Remove Neuron install doc as backend no longer exists ( #24396 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-13 00:15:03 -07:00
Hyogeun Oh (오효근)
9a8966bcc2
[Docs] Fix warnings in mkdocs build (continued) ( #24791 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-13 00:13:44 -07:00
Woosuk Kwon
5febdc8750
[Chore] Remove unused batched RoPE op & kernel ( #24789 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-13 00:08:20 -07:00
Jee Jee Li
99bfef841f
[Bugfix] Fix GPUModelRunner has no attribute lora_manager ( #24762 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 23:55:14 -07:00
Shane A
89e08d6d18
[Model] Add Olmo3 model implementation ( #24534 )
...
Signed-off-by: Shane A <shanea@allenai.org>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-13 03:26:21 +00:00
Chenheli Hua
7f2ea7074e
[Frontend][Multimodal] Allow skipping media data when UUIDs are provided. ( #23950 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-09-13 02:16:06 +00:00
Nick Hill
4fdd6f5cbf
[Core] Support async scheduling with uniproc executor ( #24219 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Co-authored-by: Ronald1995 <ronaldautomobile@163.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-12 16:34:28 -07:00
Tao He
8226dd56bf
[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes ( #24660 ) ( #24667 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
2025-09-12 22:31:32 +00:00
Matthew Bonanni
5fe643fc26
Add FLASHINFER_MLA to backend selector test ( #24753 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-12 22:30:07 +00:00
Matthew Bonanni
7ba32aa60b
[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode ( #24705 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-12 15:45:53 -06:00
Alexandre Marques
c89ed8de43
Invert pattern order to make sure that out_proj layers are identified ( #24781 )
...
Signed-off-by: Alexandre Marques <almarque@redhat.com>
2025-09-12 14:45:29 -07:00
Wentao Ye
3beadc2f25
[Compilation Bug] Fix Inductor Graph Output with Shape Issue ( #24772 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-12 21:23:05 +00:00
Clayton Coleman
bc636f21a6
[Benchmark] Allow arbitrary headers to be passed to benchmarked endpoints ( #23937 )
...
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
2025-09-12 13:57:53 -07:00
Zhewen Li
017354c0ef
[CI] Trigger BC Linter when labels are added/removed ( #24767 )
2025-09-12 11:44:36 -07:00
Cyrus Leung
010acc6e1e
[Bugfix] Fix incompatibility between #20452 and #24548 ( #24754 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-12 11:17:29 -07:00
afeldman-nm
c8c42597ab
[CI] Speed up model unit tests in CI ( #24253 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-09-12 10:36:50 -07:00
Michael Goin
9d2a44606d
[UX] Remove AsyncLLM torch profiler disabled log ( #24609 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-12 10:08:44 -07:00
Samit
f17c075884
[Model] Switch to Fused RMSNorm in GLM-4.1V model ( #24733 )
...
Signed-off-by: SamitHuang <285365963@qq.com>
2025-09-12 09:12:23 -07:00
Lukas Geiger
b0d1213ac3
[Models] Prevent CUDA sync in Qwen2.5-VL ( #24741 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-12 16:03:55 +00:00
Lukas Geiger
57f94e88ea
[Models] Optimise and simplify _validate_and_reshape_mm_tensor ( #24742 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-12 15:37:37 +00:00
Kebe
684b6870e1
[Bugfix][Frontend] Fix --enable-log-outputs does not match the documentation ( #24626 )
...
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-09-12 08:01:24 -07:00
dongluw
a5b84f1cbf
[Core] Shared memory based object store for Multimodal data caching and IPC ( #20452 )
...
Signed-off-by: donglu <donglu@cohere.com>
2025-09-12 07:54:17 -07:00