10130 Commits

Author SHA1 Message Date
Cyrus Leung
7c2e91c4e0
[Misc] Remove unused executor.apply_model (#26215)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 01:45:53 -07:00
Cyrus Leung
736fbf4c89
[Misc] Require merge_by_field_config argument (#26214)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 01:40:14 -07:00
Cyrus Leung
44ea85137a
[Model] Support nested structures for TensorSchema (#26212)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 01:20:32 -07:00
Harry Mellor
d3d649efec
Support expert parallel in Transformers backend (#26162)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-04 04:35:04 +00:00
Stan Wozniak
ea507c3a93
[V1] [Hybrid] Mamba2 Automatic Prefix Caching (#25752)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-04 06:34:22 +02:00
Fadi Arafeh
9705fba7b7
[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack (#25948)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-10-04 12:16:38 +08:00
Bram Wasti
2f7dbc9b42
Add batch invariant kernel override for FlashInfer backend [2/n] (#25769)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-03 19:49:30 -07:00
Ben Browning
ea25a76c05
[BugFix] Use async Mistral Tokenizer in Chat Completions (#26134)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-04 09:42:08 +08:00
Roger Wang
67bc0c003e
[Bugfix] Fix qwen3 vl dummy data generation with overrides (#26193)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-10-04 01:40:20 +00:00
Eugene Khvedchenya
5a05f26603
Fix issue of using only the part of video frame [Nemotron Nano] (#26186)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
2025-10-04 00:21:00 +00:00
Varun Sundar Rabindranath
7ef40bb983
[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels (#25488)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-10-03 20:13:13 -04:00
Wentao Ye
767cbb011d
[CI] Fix Pre-commit Mypy Error (#26181)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 16:08:03 -07:00
Angela Yi
7cfa4b24bf
[BugFix] Fix de-functionalization pass for rotary_embedding (#23953)
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-03 15:44:18 -07:00
Sergei Skvortsov
b71fcd4905
[Misc] Add penalties sampling parameters to serve tool (#25974)
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
2025-10-03 15:43:14 -07:00
Sahithi Chigurupati
75003f34e8
[CI] Push multiarch manifests as nightly builds (#25764)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
2025-10-03 15:42:55 -07:00
Bowen Bao
78b8015a4d
[Bugfix] Relax tokenizer regex for mixtral to include 'tokenizer.model' (#25964)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
2025-10-03 18:31:59 -04:00
Andrew Xia
831b124151
[responsesAPI] add better error messaging for long prompts (#25724)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-10-03 14:33:13 -07:00
Wentao Ye
c1ffcb55da
[Refactor] Optimize FP8 MOE Backend Choice and Log (#26044)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 15:23:42 -06:00
Corey Lowman
0879736aab
[Perf] Remove hardcoded num_warps=1 (#26183)
Signed-off-by: Corey Lowman <clowman1993@gmail.com>
2025-10-03 20:38:50 +00:00
Pavani Majety
a26917332f
[Quantization/NVFP4] Speed up TRTLLM NVFP4 MOE weight loading and fix K/V scale loading for MLA Attn (#25968)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-10-03 19:35:06 +00:00
Nikhil G
cd9e5b8340
Fix V1 engine serialization error with Ray distributed executor (#26148)
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
2025-10-03 18:39:45 +00:00
Matthew Bonanni
300a59c4c3
Avoid division by zero in cache DS MLA kernel (#26174)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-03 17:35:17 +00:00
Harry Mellor
d76541a6c5
Stop mergify from keeping stale PRs alive (#26169)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-03 16:42:34 +00:00
Chendi.Xue
dd96465fd7
[BugFix][QWEN-VL]fix wrong apply_rotary_emb_torch selection introduced by #24642 (#26123)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-03 08:52:26 -07:00
Jun Jiang
4f8f47e87e
Fix undefined symbol: cutlass_moe_mm_sm100 (#26098)
Signed-off-by: Jun Jiang <jasl9187@hotmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-03 15:48:32 +00:00
Cyrus Leung
d78fda7cda
[Renderer] Move Processor out of LLMEngine (#26165)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-03 15:08:22 +00:00
Aleksandr Samarin
73a99cc2a5
[Model] Fixed stream generator for gpt-oss + spec-decoding (#26027)
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>
2025-10-03 13:43:41 +00:00
Xiang Si
adae0c1f43
[CI/Build] do not enforce precompilation on tpu ci tests (#25992)
Signed-off-by: Xiang Si <sixiang@google.com>
2025-10-03 13:38:42 +00:00
whx
cbf9221992
[Model] Supplement to PR 24862: Pass param prefix to LLMHead (#25805)
Signed-off-by: whx-sjtu <2952154980@qq.com>
2025-10-03 21:34:53 +08:00
Paul Pak
5f42fc53b6
[backends][short_conv] CUDA graph piecewise edits (#24215)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
2025-10-03 12:59:48 +00:00
Yannick Schnider
8ee846c27c
[Bugfix] Re-enable prefill of max model length (#24446)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-10-03 14:13:34 +02:00
Yang Liu
812b7f54a8
[Renderer] Move Processor out of AsyncLLM (#24138)
Signed-off-by: Yang <lymailforjob@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-03 11:29:45 +00:00
Sage Moore
5f2cacdb1e
Quick fix for IMA with the Prefix Prefill kernel during graph capture (#25983)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-10-03 11:28:22 +00:00
Egor
aa5053e3fe
[Doc] Fixed shape description for fused_batched_moe.py (#25668)
Signed-off-by: Egor <e.a.krivov@gmail.com>
2025-10-03 04:00:23 -07:00
Wenlong Wang
79aa244678
[Multi Modal] Configurable MM Profiling (#25631)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-03 03:59:10 -07:00
kyt
2ed3f20dba
[openai] Fix missing tool usage check (system message) (#24768)
Signed-off-by: kyt <eluban4532@gmail.com>
2025-10-03 18:55:44 +08:00
Nicolò Lucchesi
48f309029a
[NIXL][Misc] Expose metrics from NIXL for logging to CLI (#25388)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-10-03 10:47:59 +00:00
Thomas Parnell
0e93ac0b3a
[CI] Fix distributed hybrid tests in CI (#26155)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-03 09:14:18 +00:00
Yannick Schnider
5446ad1d24
[test utils] correct wrong typing (#26159)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-10-03 02:11:49 -07:00
Cyrus Leung
f9a8084e48
[Model] Use merge_by_field_config for MM models (InternVL family) (#26153)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-03 01:59:06 -07:00
HUIJONG JEONG
3e70e3d4d5
add(v1): RequestStatesStats to RequestOutput (#24947)
Signed-off-by: huijjj <huijong.jeong@squeezebits.com>
2025-10-03 08:56:25 +00:00
Jiangyun Zhu
eb0fa43868
[Perf] Optimize reshape_and_cache CUDA Kernel (#25955)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Liu-congo <1502632128@qq.com>
2025-10-03 01:33:46 -07:00
Cyrus Leung
0ad9951c41
[Input] Remove unused prompt field (#26097)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-03 00:23:21 -07:00
Varun Sundar Rabindranath
8c9117181d
[Misc] Remove typing.List (#26150)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-10-03 07:00:33 +00:00
ahao-anyscale
c4b48d3c0f
[BUG] Reorder model config creation (#26124)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2025-10-03 14:59:36 +08:00
Harry Mellor
10d765482d
FusedMoE support for the Transformers backend (#22650)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-02 23:12:15 -07:00
Cyrus Leung
39b643dc1a
[Model] Use merge_by_field_config for MM models (G) (#26117)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-02 22:38:29 -07:00
Zhewen Li
711f485643
[Bugfix] Fix import gemm_afp4wfp4 failure on AMD (#26068)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-10-02 22:37:25 -07:00
TJian
9c5ee91b2a
[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm (#26104)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-10-02 22:34:53 -07:00
Tyler Michael Smith
27edd2aeb4
[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv (#26103)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-10-02 22:21:01 -07:00