1699 Commits

Author SHA1 Message Date
Yash Pratap Singh
c9f7081f9c
[LoRA]: Add lora support to qwen-2.5-omni (#24231) 2025-09-04 05:50:50 -07:00
Jiangyun Zhu
eafa8dcde6
[Model] Add pp support for hunyuan (#24212)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-04 03:58:26 -07:00
TJian
6c7af8110a
[Doc] Update vLLM Singapore Meetup info (#24234)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-09-04 02:58:18 -07:00
bingchen-mi
e7fc70016f
[Model] Add MiDashengLM model support (#23652)
Signed-off-by: chenbing8 <chenbing8@xiaomi.com>
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-04 00:08:09 -07:00
bnellnm
e9b92dcd89
[Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-03 12:35:18 -04:00
nopperl
fa4311d85f
[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998)
Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp>
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
2025-09-03 08:24:02 -07:00
youkaichao
f38035c123
[distributed][rl] remove nccl cumem env var override (#24141)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-03 06:45:25 +00:00
co63oc
1bd007f234
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-02 20:44:50 -07:00
Peter Pan
0e1759cd54
[docs] add SYS_NICE cap & security-opt for docker/k8s (#24017)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-02 17:27:20 +00:00
Christian Berge
8bd5844989
correct LWS deployment yaml (#23104)
Signed-off-by: cberge908 <42270330+cberge908@users.noreply.github.com>
2025-09-02 12:04:59 +00:00
WeiQing Chen
2f0bab3f26
[Model] Support dp on ViT on GLM-4.5V (#23168)
Signed-off-by: David Chen <530634352@qq.com>
2025-09-02 10:48:18 +00:00
WeiQing Chen
a0e0efd6bd
[Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 (#23817)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-09-01 16:56:56 +00:00
Christian Pinto
cf91a89dd2
[docs][misc] IOProcessor plugins fixes (#24046)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
2025-09-01 09:17:41 -07:00
Julien Debache
41c80698b3
Document multi-proc method selection for profiling (#23802)
Signed-off-by: jdebache <jdebache@nvidia.com>
2025-09-01 06:28:26 -07:00
Kwai-Keye
7c8271cd1e
[Model]: support KeyeVL-1_5-8B (#23838)
Signed-off-by: wangruitao <wangruitao@kuaishou.com>
Co-authored-by: wangruitao <wangruitao@kuaishou.com>
2025-09-01 03:50:27 -07:00
Kay Yan
3e330fcb21
[Doc]: Fix CPU install docs: force torch-backend=cpu to avoid GPU torchvision errors (#24033)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-09-01 03:34:52 -07:00
Christian Pinto
1cb39dbcdd
[Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-31 23:07:12 -07:00
Roger Wang
749be00a98
[Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers. (#23394)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-08-30 18:01:22 -07:00
sadegh.shokatian
379ea2823a
Add LoRA support for DeepSeek models (V2, V3, R1-0528) (#23971)
Signed-off-by: sadeghja1070 <sadegh.ja1070@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-08-30 06:40:02 -07:00
Jiangyun Zhu
3a6acad431
[Model] Enable encoder DP for MiniCPM-V (#23948)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-08-30 06:31:26 -07:00
Thomas Parnell
1c26b42296
[Docs] [V1] [Hybrid] Add new documentation re: contributing mamba-based models (#23824)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-29 18:47:58 +00:00
Li, Jiang
ad39106b16
[CPU] Enable data parallel for CPU backend (#23903)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-29 02:19:58 -07:00
Didier Durand
d99c3a4f7b
[Doc]: fix typos in .md files (including those of #23751) (#23825)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-08-28 04:38:19 -07:00
Isotr0py
c5d004aaaf
[Model] Add PP support and VLM backbone compatability for GPT-OSS (#23680)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-28 16:03:28 +08:00
wang.yuqi
11a7fafaa8
[New Model]: Support GteNewModelForSequenceClassification (#23524)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-28 15:36:42 +08:00
Harry Mellor
2b61d2e22f
[Docs] Remove in-tree Gaudi install instructions (#23628)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-27 09:22:21 -07:00
Isotr0py
841490434a
[Model] Enable native HF format InternVL support (#23742)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-27 14:45:17 +00:00
Cyrus Leung
fe8d7b6f03
[Model] Interface to enable batch-level DP support (#23733)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-27 06:41:22 -07:00
Harry Mellor
16dc4052b0
Fix pre-commit on main (#23747)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-27 06:39:48 -07:00
rebel-hongseok
8dd2baa597
Add vLLM Korea Meetup in the README.md and meetups.md (#23746)
Signed-off-by: rebel-hongseok <hongseok@rebellions.ai>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-27 06:25:49 -07:00
Thomas Parnell
704432af3c
[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-based models (#23716)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-27 12:51:54 +00:00
tc-mb
9d30de4469
[model] Support MiniCPM-V 4.5 (#23586)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Pate Motter <patemotter@google.com>
Signed-off-by: Terrencezzj <terrence@cohere.ai>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
Signed-off-by: jiabin.00 <jiabin.00@bytedance.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: tc-mb <157115220+tc-mb@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Matúš Námešný <matus.namesny@ameria.com>
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: oye93 <en.ouyang93@outlook.com>
Signed-off-by: Julien Lin <jullin@nvidia.com>
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Tianyu Li <tianyu.li@arm.com>
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com>
Signed-off-by: Federico <65908512+coval3nte@users.noreply.github.com>
Signed-off-by: Zixuan Zhang <zixuanzhang@bytedance.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
Signed-off-by: Wei Wei <wwei6@meta.com>
Signed-off-by: Yiheng Xu <charlesyihengxu@gmail.com>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Zhonghua Deng <abzhonghua@gmail.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Pate Motter <p@temotter.com>
Co-authored-by: Terrence Zhao <32208165+Terrencezzj@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: weiliang <weiliangl@nvidia.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Raghavan <oneraghavan@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Matúš Námešný <matus@namesny.com>
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: En Ouyang <en.ouyang93@outlook.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: nvjullin <jullin@nvidia.com>
Co-authored-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: TianyuLi0 <116711075+TianyuLi0@users.noreply.github.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Huzaifa Sidhpurwala <huzaifas@redhat.com>
Co-authored-by: Federico <65908512+coval3nte@users.noreply.github.com>
Co-authored-by: zixuanzhang226 <zixuanzhang@bytedance.com>
Co-authored-by: wuhang <wuhang6@huawei.com>
Co-authored-by: yzds <41983536+youzhedian@users.noreply.github.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: czhu-cohere <conway.zhu@cohere.com>
Co-authored-by: Wei <weiweinpu@gmail.com>
Co-authored-by: Yiheng Xu <charlesyihengxu@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
Co-authored-by: CSWYF3634076 <58356743+CSWYF3634076@users.noreply.github.com>
2025-08-27 05:38:00 -07:00
Michael Yao
1f7a9c95e4
[Docs] Fix a 1-2-3 list and style issues in tpu.md (#23729)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-08-27 05:37:52 -07:00
Michael Yao
5bd9f84158
[Docs] Fix an admonition important (#23726)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-08-27 02:50:09 -07:00
Cyrus Leung
69244e67e6
[Core] Use key-only cache for BaseMultiModalProcessor (#23018)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-27 14:19:13 +08:00
CSWYF3634076
644d57d531
[Model] Add Ernie4.5 VL Model Support (#22514)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2025-08-26 21:02:55 -07:00
Harry Mellor
6dab89b8ec
[Docs] Fix math rendering in docs (#23676)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-26 18:47:08 -07:00
Harry Mellor
6421b66bf4
[Docs] Move quant supported hardware table to README (#23663)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-26 22:26:46 +00:00
Chen Zhang
d696f86e7b
[doc] Hybrid KV Cache Manager design doc (#22688)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-26 20:19:05 +00:00
Isotr0py
9816b81f5f
[Model] Enable video support for InternVL3.5 models (#23658)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-26 19:46:52 +00:00
Thomas Parnell
227e231b55
[Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models (#23665)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-26 18:33:16 +00:00
Harry Mellor
379f828fba
[Docs] Reduce requirements for docs build (#23651)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-26 15:43:28 +00:00
Didier Durand
7c04779afa
[Doc]: fix various spelling issues in multiple files (#23636)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-08-26 14:05:29 +00:00
Harry Mellor
164b2273c8
[Docs] Fix broken links to docs/api/summary.md (#23637)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-26 13:00:18 +00:00
Raghavan
ff77764f86
Fix CLI parameter documentation inconsistency in pooling_models.md (#23630) 2025-08-26 01:05:37 -07:00
Harry Mellor
bfc1edc9f5
[Docs] Fix titles for multi-file examples that are rendered in the docs (#23573)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-26 00:16:44 -07:00
Terrence Zhao
7b6a837275
[Docs] Update Documentation of Cohere Command-A Models (#23584)
Signed-off-by: Terrencezzj <terrence@cohere.ai>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Co-authored-by: Zhonghua Deng <abzhonghua@gmail.com>
2025-08-25 21:53:52 +00:00
Cyrus Leung
e269be2ba2
[Doc] Add caution for API server scale-out (#23550)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-25 06:14:15 -07:00
youkaichao
d0a4a3f645
[misc] add shanghai meetup (#23535)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-25 17:00:03 +08:00
Didier Durand
47455c424f
[Doc: ]fix various typos in multiple files (#23487)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-25 00:04:04 +00:00