2884 Commits

Author SHA1 Message Date
Peter Pan
b5ee1e3261
Remove deprecated PyNcclConnector (#24151)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-09-03 22:49:16 +00:00
Matthew Bonanni
a742322092
[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-03 14:05:24 -04:00
bnellnm
e9b92dcd89
[Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-03 12:35:18 -04:00
nopperl
fa4311d85f
[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998)
Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp>
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
2025-09-03 08:24:02 -07:00
wang.yuqi
51383bd472
[CI] Accelerate mteb test by setting SentenceTransformers mteb score to a constant (#24088)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-03 17:23:56 +08:00
Isotr0py
9c99e4871f
[Misc] Clean up deadcode for legacy processing pipeline (#24153)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-03 08:34:29 +00:00
dsinghvi
70549c1245
[CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-03 16:13:11 +08:00
Didier Durand
d7e1e59972
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-02 21:05:45 -07:00
co63oc
1bd007f234
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-02 20:44:50 -07:00
afeldman-nm
136d853e65
[V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (#23656)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-09-03 02:52:51 +00:00
Thomas Parnell
d328f7894f
[CI] Enable all hf transformers baselines in test_hybrid (#23936)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-09-02 20:15:06 +00:00
Mark McLoughlin
2417798471
[Metrics] Deprecate TPOT in favor of ITL (#24110)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-09-02 18:10:10 +00:00
Chenheli Hua
f399182e8c
Run ruff format on a few files. (#24075)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-09-02 17:55:32 +00:00
Michael Goin
e66ed3e675
[CI Failure] Skip failing nvfp4 silu test (#23959)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-02 13:18:15 -04:00
Aziz
ce30dca5c4
[CI]: reduce HTTP calls inside entrypoints openai tests (#23646)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Aziz <azizbenothman76@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-02 10:49:32 +00:00
Didier Durand
fad73be1a5
[Doc]: fix typos in Python comments (#24077)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-02 02:38:55 -07:00
Asaf Joseph Gardin
2b41cbbf03
[V1][Mamba1] - FP32 SSM Kernel Support (#23506)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
2025-09-01 20:53:00 -07:00
WeiQing Chen
a0e0efd6bd
[Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 (#23817)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-09-01 16:56:56 +00:00
Christian Pinto
cf91a89dd2
[docs][misc] IOProcessor plugins fixes (#24046)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
2025-09-01 09:17:41 -07:00
Kwai-Keye
7c8271cd1e
[Model]: support KeyeVL-1_5-8B (#23838)
Signed-off-by: wangruitao <wangruitao@kuaishou.com>
Co-authored-by: wangruitao <wangruitao@kuaishou.com>
2025-09-01 03:50:27 -07:00
Nicolò Lucchesi
d46934b229
[Frontend] Gemma3n audio transcriptions/translations endpoint (#23735)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-01 18:07:46 +08:00
Code Jesus
422e793fa6
[Bugfix] Add support for <tool_call> format in streaming mode for XLAM Tool Parser (#22769)
Signed-off-by: Devon Peroutky <devon@kindo.ai>
2025-09-01 14:07:54 +08:00
Christian Pinto
1cb39dbcdd
[Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-31 23:07:12 -07:00
Isotr0py
ff0e59d83a
[CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization (#23357)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-31 22:52:20 -07:00
Roger Wang
749be00a98
[Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers. (#23394)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-08-30 18:01:22 -07:00
Ning Xie
5490d633ce
[UT] fix unify_kv_cache_configs when kv cache config needs sort (#23843) 2025-08-30 11:22:14 +00:00
Jee Jee Li
628d00cd7b
[Bugfix] Fix test_lora_resolvers.py (#23984)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-30 11:16:11 +00:00
Roger Wang
ee52a32705
[CI] Move testing image from remote URL to S3 (#23980)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-08-29 21:41:25 -07:00
dubejf
5b31cb1781
[Bugfix] Fix --config arg expansion called from api_server.py (#23944)
Signed-off-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-08-29 21:36:39 -07:00
Roger Wang
d660c98c1b
[CI] Fix unavailable image remote URL (#23966)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-08-29 15:40:04 -07:00
Yong Hoon Shin
8c3e199998
Revert gemma3n fast prefill changes (#23897)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-29 12:16:57 -07:00
22quinn
4d7fe40fc0
[RL][BugFix] Fix missing tokenizer error for token-in-token-out (#23904)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-08-30 01:09:55 +08:00
Nick Hill
d90d8eb674
[BugFix] Async scheduling and PP compatibility with DP (#23770)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-29 08:17:27 -07:00
EduardDurech
1cf3753b90
[MODEL] Apertus and XIELU (#23068)
Signed-off-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com>
Co-authored-by: AllenHaoHuang <allenhuangdd@gmail.com>
2025-08-29 20:29:18 +08:00
Adit Chawdhary
4f7cde7272
Adds json_count_leaves utility function (#23899)
Signed-off-by: aditchawdhary <aditxy@hotmail.com>
2025-08-29 05:28:13 -07:00
Huy Do
67c14906aa
Update PyTorch to 2.8.0 (#20358)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-29 18:57:35 +08:00
Flora Feng
69f46359dd
[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (#23779)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-08-29 18:36:57 +08:00
wang.yuqi
d9e00dbd1f
[Performance] V1 Classify Models E2E Performance Optimization (#23541)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-29 03:12:32 -07:00
Maximilien de Bayser
2554b27baa
[V0 Deprecation] Remove pooling model support in V0 (#23434)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-29 00:04:02 -07:00
Jee Jee Li
b4f9e9631c
[CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-28 23:28:35 -07:00
Isotr0py
98ac0cb32d
[Bugfix] Use ReplicatedLinear for SequenceClassification head (#23836)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-29 04:41:20 +00:00
Russell Bryant
c8b3b299c9
[tests] Improve speed and reliability of test_transcription_api_correctness (#23854)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-08-29 04:25:33 +00:00
Woosuk Kwon
b668055a11
[V0 Deprecation] Remove V0 Samplers test (#23862) 2025-08-28 18:05:52 -07:00
Yong Hoon Shin
cb293f6a79
[V1] Enable prefill optimization for Gemma3n (#22628)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-28 14:54:30 -07:00
elvischenv
16a45b3a28
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
2025-08-28 19:36:50 +00:00
wang.yuqi
11a7fafaa8
[New Model]: Support GteNewModelForSequenceClassification (#23524)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-28 15:36:42 +08:00
yzds
186aced5ff
[Kernel] cuda kernels for upcoming decode context parallel feature (#23791)
Co-authored-by: hongchao <hongchao@msh.team>
2025-08-28 15:29:11 +08:00
Jiangyun Zhu
c07a73317d
[CI] enable idefics3 and fuyu-8b test in multimodal test (#23790)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-08-28 14:51:24 +08:00
Kyle Sayers
22feac8e95
[Transform] [Quantization] Add transforms to compressed tensors (#22486) 2025-08-28 02:43:48 -04:00
Hanchenli
5da4f5d857
[Bugfix] Fix for V1 priority scheduling crashes at preemption (#23713)
Signed-off-by: Hanchenli <lihanc2002@gmail.com>
2025-08-28 00:44:52 +00:00