3017 Commits

Author SHA1 Message Date
kourosh hakhamaneshi
abad204be6
[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting (#25359)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-09-23 15:49:09 -07:00
Andrew Xia
95bc60e4cb
[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI (#25428)
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-23 15:46:46 -07:00
Thomas Parnell
969b4da3a6
[V0 Deprecation] Remove placeholder attn (#25510)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-09-23 22:12:14 +00:00
Jialin Ouyang
4f8c4b890a
[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-09-23 15:11:14 -07:00
Isotr0py
ae002924e9
[CI/Build] Fix and re-enable v1 PP test on CI (#25496)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-23 21:58:25 +00:00
Ilya Markov
8bdd8b5c51
Enable symmetric memory all reduce by default only enabling for TP (#25070)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-23 15:53:00 -04:00
jiahanc
d5944d5146
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-09-23 15:44:35 -04:00
ElizaWszola
63400259d0
[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-09-23 12:03:10 -07:00
Amir Samani
8c1c81a3de
[core] add nccl symmetric memory for all reduce (#24532)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-23 14:33:06 -04:00
Hashem Hashemi
a3a7828010
[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 (#24988)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com>
2025-09-23 14:31:45 -04:00
Burkhard Ringlein
100b630a60
[V1][Kernel] Add triton implementation for reshape_and_cache_flash (#24503)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-23 12:52:40 -04:00
Harry Mellor
875d6def90
Add backward compatibility for GuidedDecodingParams (#25422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-23 17:07:30 +01:00
Lucas Wilkinson
cc1dc7ed6d
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-09-23 16:02:10 +00:00
Thomas Parnell
a903669e10
[V1] Remove V0 code paths for Hybrid models (#25400)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-09-23 08:26:13 -07:00
Peter Pan
da5e7e4329
[Docs] NixlConnector quickstart guide (#24249)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-09-23 14:23:22 +00:00
Chauncey
f05a4f0e34
[P/D] Support NIXL connector to disconnect during a clean shutdown (#24423)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-09-23 16:08:02 +02:00
Isotr0py
b6a136b58c
[CI/Build] Fix disabled v1 attention backend selection test (#25471)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-23 13:05:46 +00:00
Jee Jee Li
273690a50a
[Core] Optimize LoRA weight loading (#25403)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-23 18:19:45 +08:00
Andreas Hartel
4322c553a6
[Test]: Hermes tool parser stream output error in Qwen3 case (#25203)
Signed-off-by: Andreas Hartel <andreas.hartel@aleph-alpha.com>
2025-09-23 17:56:31 +08:00
Cyrus Leung
babad6e5dd
[Misc] Move DP for ViT code inside model executor dir (#25459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-23 09:20:52 +00:00
Zhikaiiii
9383cd6f10
[Frontend] Add a new xml-based tool parser for qwen3-coder (#25028)
Signed-off-by: Zhikaiiii <1658973216@qq.com>
2025-09-23 16:07:27 +08:00
Chendi.Xue
5774b0a1da
[NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend (#25121)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
2025-09-23 04:17:42 +00:00
Isotr0py
6fa78d8f23
[V0 deprecation] Remove platform v1 controling interface (#25410)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-23 01:48:12 +00:00
Nicolò Lucchesi
090197034f
[Bugfix] Fix missing clear_connector_metadata (#25397)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-23 08:10:59 +08:00
Luka Govedič
d588cd2406
[Bugfix] fix custom op test (#25429)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
2025-09-23 00:07:43 +00:00
Alec S
45d7d852d3
[Frontend] Responses API MCP tools for built in tools and to pass through headers (#24628)
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-22 23:38:19 +00:00
Cyrus Leung
f552d5e578
[CI/Build] Skip Qwen3-VL initialization tests until models are actually released (#25394)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-22 13:18:24 -07:00
Or Ozeri
8db2939289
[KV offload][5/N] Add CPUOffloadingSpec (#24251)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-22 12:30:36 -07:00
Luka Govedič
d5e0fca264
[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091), fix test (#24376), and prep for custom op matching (#24604) (#24542)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-22 12:30:05 -07:00
Lucia Fang
922979bfcc
[DP] support torchrun external launcher with Data Parallelism (#24899)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2025-09-22 12:06:05 -07:00
Daisy-Ma-coder
cfbee3d0e7
[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274)
Signed-off-by: qqma <qqma@amazon.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: qqma <qqma@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-22 10:37:43 -07:00
Nicolò Lucchesi
4cf71cc88a
[TPU] Deprecate xm.mark_step in favor of `torch_xla.sync (#25254)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-22 10:12:57 +00:00
Cyrus Leung
f92d952632
[V0 Deprecation] Remove MultiModalPlaceholderMap (#25366)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-22 08:49:19 +00:00
Cyrus Leung
6d0b827cbd
[V0 Deprecation] Remove V0-only methods in multi-modal registry (#25362)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-22 13:58:26 +08:00
WeiQing Chen
0eecb31663
[Bugfix] Fix hermes tool parser handling of non-string argument types (#22002)
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-09-22 11:35:39 +08:00
WeiQing Chen
793be8d057
[Docs] GSM8K Accuracy Evaluation doc update (#25360)
Signed-off-by: David Chen <530634352@qq.com>
2025-09-22 02:49:13 +00:00
Roger Wang
7b57a433da
[Model] Support Dots OCR (#24645)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: yinz-aizip <yinz@aizip.ai>
2025-09-22 02:24:40 +00:00
Deboleina
5aeb925452
Multimodal - audio tests (#25285)
Signed-off-by: Debolina Roy <debroy@redhat.com>
2025-09-22 07:07:11 +08:00
Yang Liu
04d3752329
[Bugfix][V0 Deprecation][CI] use async mock and await for async method (#25325)
Signed-off-by: Yang <lymailforjob@gmail.com>
2025-09-22 07:06:16 +08:00
Woosuk Kwon
bc6e542d9f
Remove V0 attention backends (#25351)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-21 16:03:28 -07:00
Woosuk Kwon
1c3ffdbecc
[V0 Deprecation] Remove V0 sampling metadata (#25345)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-21 10:37:11 -07:00
Rahul Tuli
c438b2951c
feat: Enable engine-level arguments with speculators models (#25250)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-21 11:04:45 -06:00
Woosuk Kwon
0ff8ebb2d7
[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-21 08:52:32 -07:00
Woosuk Kwon
26e673fe93
[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-21 08:52:15 -07:00
Isotr0py
cf56cf78b4
[V1] Add sliding window support to Flex Attention backend (#24089)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-21 05:08:07 +00:00
Woosuk Kwon
72dd1595b4
[CI] Skip tests failing on main (#25326)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 19:57:46 -07:00
Woosuk Kwon
572ddf83ce
[Chore] Remove unused sampler in models (#25324)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 19:53:20 -07:00
Woosuk Kwon
86647d1cd0
[V0 Deprecation] Remove V0 Output Processor (#25320)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 17:57:20 -07:00
Woosuk Kwon
52c2a8d4ad
[V0 Deprecation] Remove LLMEngine (#25033)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 17:56:30 -07:00
Cyrus Leung
bef180f009
[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 17:50:58 +00:00