Chenguang Zheng
4ccffe561f
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation ( #25233 )
...
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com>
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com>
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com>
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>
2025-11-11 18:58:33 -08:00
Li, Jiang
7f829be7d3
[CPU] Refactor CPU attention backend ( #27954 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-12 09:43:06 +08:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic ( #24794 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
zhangsicheng5
2108a571d7
[DCP] Support dcp kv_cache interleave size > 1 ( #26696 )
...
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-11-09 04:45:27 +09:00
Xiaohong (Sean) Chen
d0c7792004
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding ( #21068 )
...
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
2025-11-08 01:58:22 +00:00
Nick Hill
da786e339e
[Core] Rework handling of async scheduling config ( #28250 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-07 20:01:23 +00:00
wangxiyuan
c3ee80a01a
[V0 deprecation]clean up is_v1_supported_oracle ( #28116 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-06 16:05:32 +08:00
Walter Beller-Morales
752ddeacaa
[Core] add support for reasoning parser plugins ( #28075 )
...
Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com>
2025-11-06 01:15:06 +08:00
Chauncey
e261d37c9a
[Refactor] Lazy-loaded reasoning_parser ( #28092 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-05 15:37:02 +08:00
Harry Mellor
2f1cc8cef1
Remove deprecated --rope-scaling and --rope-theta ( #28006 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-04 18:01:56 +00:00
pwschuurman
f7d2946e99
[Bugfix] Skip gs:// model paths for speculator detection ( #27846 )
...
Signed-off-by: Peter Schuurman <psch@google.com>
2025-11-03 14:31:03 +00:00
Yihua Cheng
e675118849
[Add] cmdline argument parsing for KV cache offloading modules ( #27621 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-01 07:17:07 +00:00
Vinay R Damodaran
5e8862e9e0
[Feature] Pydantic validation for scheduler.py and structured_outputs.py ( #26519 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-31 18:05:50 +00:00
Nick Hill
9e5bd3076e
[Cleanup] Remove no-longer-used SpeculativeConfig.enable_chunked_prefill ( #27826 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-31 10:57:45 -07:00
Wentao Ye
c01f6e525f
[CI] Fix mypy for vllm/v1/core and vllm/v1/engine ( #27108 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-30 11:32:17 +00:00
Kevin H. Luu
3481e40743
[chore] Remove models weight on S3 logic ( #27725 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2025-10-29 10:29:49 +00:00
Dipika Sikka
413ef7a3b4
[Speculators] Move tests + fix integration ( #27308 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: rahul-tuli <rtuli@redhat.com>
Co-authored-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-10-29 00:54:21 -07:00
22quinn
2abbd351ef
[Core] Enable async scheduling for external_launcher mode ( #27394 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2025-10-28 13:52:47 +00:00
Asaf Joseph Gardin
05181cc57f
[Hybrid] Add mamba_block_size to Engine Args ( #27289 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
2025-10-28 12:54:24 +00:00
Cyrus Leung
f58d9b6404
[Misc] Separate out utils.counter and move utils.Device to engine ( #27588 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-28 12:20:46 +00:00
Cyrus Leung
6ebffafbb6
[Misc] Clean up more utils ( #27567 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-27 15:30:38 +00:00
Yeshwanth N
71b1c8b667
[Chore]:Extract math and argparse utilities to separate modules ( #27188 )
...
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
2025-10-26 04:03:32 -07:00
fhl2000
284cc92275
[MISC] cudagraph_capture_sizes related improvements ( #26016 )
...
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-24 05:11:05 -07:00
Lucia Fang
fc059c7061
[Bugfix] Fix args settings for guided decoding args ( #27375 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com>
2025-10-23 07:34:06 +00:00
Andrew Sansom
ff93cc8c84
[CORE] Support Prefix Caching with Prompt Embeds ( #27219 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-10-22 22:18:07 -07:00
Russell Bryant
58fab50d82
[Frontend] Require flag for loading text and image embeds ( #27204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-22 15:52:02 +00:00
Nick Hill
647214f3d5
[V0 Deprecation] Remove V0 executors ( #27142 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-21 11:09:37 -07:00
Roger Wang
c3a2c6ac5f
[MM][Core] Decouple ViT backend from LM backend ( #27061 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-10-21 00:30:10 -07:00
Nick Hill
a03cf9bc70
[V0 Deprecation] Remove V0 metrics code ( #27215 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-21 02:02:10 +00:00
iAmir97
7a6c8c3fa1
[Chore] Separate out vllm.utils.network_utils ( #27164 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
2025-10-19 03:06:32 -07:00
Fadi Arafeh
ab4be40fc5
[fix][cpu] fix prefill attention in CPU attention backend ( #27035 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-10-18 13:30:21 +00:00
iAmir97
1d165d6d85
[Chore] Separate out vllm.utils.mem_utils ( #27143 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 10:06:59 +00:00
Hanchenli
7c572544e4
[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot ( #25515 )
...
Signed-off-by: Hanchenli <lihanc2002@gmail.com>
Signed-off-by: Hanchenli <61769611+Hanchenli@users.noreply.github.com>
Signed-off-by: Wei Wei <wwei6@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wei Wei <wwei6@meta.com>
Co-authored-by: Wei Wei <weiweinpu@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-17 21:55:54 -07:00
Lehua Ding
6367bde739
[BugFix][Core] Fix error when enable async-scheduling in multi-node env ( #25887 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com>
Signed-off-by: Lehua Ding <lehuading@qq.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-10-17 22:16:18 +00:00
Patrick von Platen
b038d9c40c
[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) ( #26367 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
2025-10-17 08:24:42 -07:00
Yongtao Huang
be429d0cfd
Fix incorrect docstring for stop_profile() method ( #27101 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
2025-10-17 06:30:23 -07:00
Harry Mellor
aa255ff55a
Support set in the CLI generation ( #27031 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-16 18:07:18 +00:00
Bram Wasti
7d8975de84
Deepseek-v3 Batch Invariant on 8xH100 ( #26609 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-15 22:06:02 -07:00
Cyrus Leung
9c4cb68339
[Chore] Remove SupportsV0Only interface and update supported models docs ( #26783 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-14 04:55:10 -07:00
Ryan Li
481545b397
scheduler.py: Update the name of the default scheduler. ( #26758 )
...
Signed-off-by: Ryan Li <ryanli@ryanli.org>
2025-10-14 06:52:21 +00:00
Michael Goin
3e051bda82
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend ( #26732 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-13 18:12:52 -07:00
Lucia Fang
8317f72354
[Misc][DP] support customized aggregated logger for dp ( #24354 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
2025-10-13 17:45:59 -07:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Ashwin Phadke
ab196edefb
Remove LoRA bias support ( #25807 )
...
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com>
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-10 09:50:33 +00:00
Cyrus Leung
4bdf7ac593
[Bugfix] Fix SHM cache initialization ( #26427 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-09 02:48:04 -07:00
Simon Danielsson
e4791438ed
[Feature] Use pydantic validation in lora.py and load.py configs ( #26413 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
2025-10-09 02:38:33 -07:00
Rui Qiao
467a4f98f1
[Misc] Redact ray runtime env before logging ( #26302 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-10-08 17:43:34 -07:00
Vinay R Damodaran
b25d7b5657
[Feature] Change cache.py with pydantic validation ( #26390 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-08 11:12:59 -07:00
Harry Mellor
2f99f2f506
Tidy vllm/config/__init__.py to only add classes and functions ( #26405 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-08 07:10:00 -07:00
Sage Moore
2111b4643c
[Core] Simplify the Dp padding/should ubatch coordination logic ( #25768 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-10-07 01:57:49 +00:00