Thomas Parnell
ed3aeb25a4
[V1] [Hybrid] Remove code to override default CUDA graph configuration ( #26226 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-04 13:47:48 +00:00
yuafng
86ee949128
Fix tensor device and dtype placement in Qwen2VL model ( #26219 )
...
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Yuanfeng Li <yuanfengli@meta.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-04 06:41:39 -07:00
Cyrus Leung
4570535ec4
[Model] CLIP Embedding Support ( #26010 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 06:21:42 -07:00
Nicolò Lucchesi
2a6dc67eb5
[Bugfix] Fix _reqs_to_process leak on abort ( #26012 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-10-04 11:39:31 +00:00
Yannick Schnider
f05fea1f5e
[Core] Enable decode of context length equal to max model length ( #26168 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-10-04 09:59:26 +00:00
Luca Soldaini
d0df145c2a
Add Olmo 3 reasoning parser ( #26054 )
...
Signed-off-by: Luca Soldaini <luca@soldaini.net>
2025-10-04 17:48:29 +08:00
Cyrus Leung
1838cd4860
Revert "Add batch invariant kernel override for FlashInfer backend [2/n]" ( #26220 )
2025-10-04 02:45:08 -07:00
Huamin Li
7d6b03381e
[CI Failure] fix_test_auto_prefix_cache_support ( #26053 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-04 02:44:49 -07:00
Cyrus Leung
7c2e91c4e0
[Misc] Remove unused executor.apply_model ( #26215 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 01:45:53 -07:00
Cyrus Leung
736fbf4c89
[Misc] Require merge_by_field_config argument ( #26214 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 01:40:14 -07:00
Cyrus Leung
44ea85137a
[Model] Support nested structures for TensorSchema ( #26212 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 01:20:32 -07:00
Harry Mellor
d3d649efec
Support expert parallel in Transformers backend ( #26162 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-04 04:35:04 +00:00
Stan Wozniak
ea507c3a93
[V1] [Hybrid] Mamba2 Automatic Prefix Caching ( #25752 )
...
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-04 06:34:22 +02:00
Fadi Arafeh
9705fba7b7
[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack ( #25948 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-10-04 12:16:38 +08:00
Bram Wasti
2f7dbc9b42
Add batch invariant kernel override for FlashInfer backend [2/n] ( #25769 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-03 19:49:30 -07:00
Ben Browning
ea25a76c05
[BugFix] Use async Mistral Tokenizer in Chat Completions ( #26134 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-04 09:42:08 +08:00
Roger Wang
67bc0c003e
[Bugfix] Fix qwen3 vl dummy data generation with overrides ( #26193 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-10-04 01:40:20 +00:00
Eugene Khvedchenya
5a05f26603
Fix issue of using only the part of video frame [Nemotron Nano] ( #26186 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
2025-10-04 00:21:00 +00:00
Varun Sundar Rabindranath
7ef40bb983
[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels ( #25488 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-10-03 20:13:13 -04:00
Wentao Ye
767cbb011d
[CI] Fix Pre-commit Mypy Error ( #26181 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 16:08:03 -07:00
Angela Yi
7cfa4b24bf
[BugFix] Fix de-functionalization pass for rotary_embedding ( #23953 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-03 15:44:18 -07:00
Sergei Skvortsov
b71fcd4905
[Misc] Add penalties sampling parameters to serve tool ( #25974 )
...
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
2025-10-03 15:43:14 -07:00
Sahithi Chigurupati
75003f34e8
[CI] Push multiarch manifests as nightly builds ( #25764 )
...
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
2025-10-03 15:42:55 -07:00
Bowen Bao
78b8015a4d
[Bugfix] Relax tokenizer regex for mixtral to include 'tokenizer.model' ( #25964 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com>
2025-10-03 18:31:59 -04:00
Andrew Xia
831b124151
[responsesAPI] add better error messaging for long prompts ( #25724 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-10-03 14:33:13 -07:00
Wentao Ye
c1ffcb55da
[Refactor] Optimize FP8 MOE Backend Choice and Log ( #26044 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 15:23:42 -06:00
Corey Lowman
0879736aab
[Perf] Remove hardcoded num_warps=1 ( #26183 )
...
Signed-off-by: Corey Lowman <clowman1993@gmail.com>
2025-10-03 20:38:50 +00:00
Pavani Majety
a26917332f
[Quantization/NVFP4] Speed up TRTLLM NVFP4 MOE weight loading and fix K/V scale loading for MLA Attn ( #25968 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-10-03 19:35:06 +00:00
Nikhil G
cd9e5b8340
Fix V1 engine serialization error with Ray distributed executor ( #26148 )
...
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
2025-10-03 18:39:45 +00:00
Matthew Bonanni
300a59c4c3
Avoid division by zero in cache DS MLA kernel ( #26174 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-03 17:35:17 +00:00
Harry Mellor
d76541a6c5
Stop mergify from keeping stale PRs alive ( #26169 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-03 16:42:34 +00:00
Chendi.Xue
dd96465fd7
[BugFix][QWEN-VL]fix wrong apply_rotary_emb_torch selection introduced by #24642 ( #26123 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-03 08:52:26 -07:00
Jun Jiang
4f8f47e87e
Fix undefined symbol: cutlass_moe_mm_sm100 ( #26098 )
...
Signed-off-by: Jun Jiang <jasl9187@hotmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-03 15:48:32 +00:00
Cyrus Leung
d78fda7cda
[Renderer] Move Processor out of LLMEngine ( #26165 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-03 15:08:22 +00:00
Aleksandr Samarin
73a99cc2a5
[Model] Fixed stream generator for gpt-oss + spec-decoding ( #26027 )
...
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>
2025-10-03 13:43:41 +00:00
Xiang Si
adae0c1f43
[CI/Build] do not enforce precompilation on tpu ci tests ( #25992 )
...
Signed-off-by: Xiang Si <sixiang@google.com>
2025-10-03 13:38:42 +00:00
whx
cbf9221992
[Model] Supplement to PR 24862: Pass param prefix to LLMHead ( #25805 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com>
2025-10-03 21:34:53 +08:00
Paul Pak
5f42fc53b6
[backends][short_conv] CUDA graph piecewise edits ( #24215 )
...
Signed-off-by: Paul Pak <paulpak58@gmail.com>
2025-10-03 12:59:48 +00:00
Yannick Schnider
8ee846c27c
[Bugfix] Re-enable prefill of max model length ( #24446 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-10-03 14:13:34 +02:00
Yang Liu
812b7f54a8
[Renderer] Move Processor out of AsyncLLM ( #24138 )
...
Signed-off-by: Yang <lymailforjob@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-03 11:29:45 +00:00
Sage Moore
5f2cacdb1e
Quick fix for IMA with the Prefix Prefill kernel during graph capture ( #25983 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-10-03 11:28:22 +00:00
Egor
aa5053e3fe
[Doc] Fixed shape description for fused_batched_moe.py ( #25668 )
...
Signed-off-by: Egor <e.a.krivov@gmail.com>
2025-10-03 04:00:23 -07:00
Wenlong Wang
79aa244678
[Multi Modal] Configurable MM Profiling ( #25631 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-03 03:59:10 -07:00
kyt
2ed3f20dba
[openai] Fix missing tool usage check (system message) ( #24768 )
...
Signed-off-by: kyt <eluban4532@gmail.com>
2025-10-03 18:55:44 +08:00
Nicolò Lucchesi
48f309029a
[NIXL][Misc] Expose metrics from NIXL for logging to CLI ( #25388 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-10-03 10:47:59 +00:00
Thomas Parnell
0e93ac0b3a
[CI] Fix distributed hybrid tests in CI ( #26155 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-03 09:14:18 +00:00
Yannick Schnider
5446ad1d24
[test utils] correct wrong typing ( #26159 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-10-03 02:11:49 -07:00
Cyrus Leung
f9a8084e48
[Model] Use merge_by_field_config for MM models (InternVL family) ( #26153 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-03 01:59:06 -07:00
HUIJONG JEONG
3e70e3d4d5
add(v1): RequestStatesStats to RequestOutput ( #24947 )
...
Signed-off-by: huijjj <huijong.jeong@squeezebits.com>
2025-10-03 08:56:25 +00:00
Jiangyun Zhu
eb0fa43868
[Perf] Optimize reshape_and_cache CUDA Kernel ( #25955 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Liu-congo <1502632128@qq.com>
2025-10-03 01:33:46 -07:00