Isotr0py
7c16f3fbcc
[Doc] Add documents for multi-node distributed serving with MP backend ( #30509 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-13 18:02:29 +00:00
lif
ddbfbe5278
[Docs] Clarify Expert Parallel behavior for attention and MoE layers ( #30615 )
...
Signed-off-by: majiayu000 <1835304752@qq.com>
2025-12-13 08:37:59 -09:00
Matthew Bonanni
f5dfbbd8e9
[Docs] Remove references to VLLM_ATTENTION_BACKEND ( #30564 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-13 10:20:15 +08:00
Michael Goin
fc0119425c
Add IBM and Red Hat to compute resources sponsors ( #30581 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-12-13 01:34:23 +00:00
ioana ghiban
3efdc3feae
[Docs][CPU backend] Add pre-built Arm CPU Docker images ( #30491 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-11 22:03:29 +00:00
Harry Mellor
93db3256a4
Give pooling examples better names ( #30488 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 16:22:58 +00:00
ioana ghiban
17cb540248
[Docs][CPU Backend] Add nightly and per revision pre-built Arm CPU wheels ( #30402 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 15:57:10 +00:00
wang.yuqi
a5f9fb5960
[Deprecation] Deprecation --convert reward, use --convert embed instead. ( #30463 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-11 10:18:25 +00:00
xyDong0223
1a516557e1
[Doc] Add Baidu Kunlun XPU support ( #30455 )
...
Signed-off-by: xyDong0223 <dongxinyu23@gmail.com>
2025-12-11 04:52:17 +00:00
Cyrus Leung
5a87d8b9b1
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release ( #30396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:35 -08:00
Xu Song
25221b44bb
Add more docs for regex ( #30106 )
...
Signed-off-by: Xu Song <xusong.vip@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-11 00:12:21 +00:00
Seiji Eicher
b9e0951f96
[docs] Improve wide-EP performance + benchmarking documentation ( #27933 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-12-10 22:15:54 +00:00
Michael Goin
fcb894222f
[Docs] Update EPLB docs ( #30426 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-10 11:56:51 -09:00
Matthew Bonanni
794a7875ee
[Misc] Consistent case for vllm bench serve results ( #30403 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-10 09:44:02 -08:00
Mark McLoughlin
2dcbac9077
[Docs] Generate full list of metrics in user docs ( #30388 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-10 16:09:34 +00:00
Wilson Wu
3bdd426636
Fix typos in comments across multiple files ( #30345 )
...
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-09 20:05:28 -08:00
Benjamin Chislett
e858bfe051
[Cleanup] Refactor profiling env vars into a CLI config ( #29912 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-09 13:29:33 -05:00
Hubert de La Jonquiere
c72ea10723
[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. ( #30056 )
2025-12-09 18:54:08 +08:00
Fanli Lin
c2e1987a6e
[Doc] update Intel GPU MM status in Feature x Hardware matrix ( #30294 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-12-09 05:16:44 +00:00
Or Ozeri
4c6fd25880
kv_transfer: Rename the shared storage connectors ( #30201 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-12-08 20:46:09 -08:00
Ming Yang
60d17251c9
[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP ( #28782 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-12-09 00:01:08 +00:00
Simon Mo
77072e93b3
[docs] governance documents ( #24801 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-08 12:06:20 +00:00
wang.yuqi
9e77ffca3f
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API ( #26686 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-08 08:10:09 +00:00
Zhiyu
cd00c443d2
[Misc] Rename TensorRT Model Optimizer to Model Optimizer ( #30091 )
...
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-12-08 07:05:27 +00:00
Isotr0py
b952f4d3c3
[v1] Add PrefixLM support to FlexAttention backend ( #27938 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-07 15:51:36 +00:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )" ( #30199 )
2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
jeremyteboul
dce6d229f7
Support multiple image/audio embeddings per requests ( #29988 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-12-07 04:34:24 +00:00
Viacheslav
21bb323542
Gigachat 3 tool parser and tests ( #29905 )
...
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
2025-12-06 12:04:14 +00:00
redwrasse
6476382384
prefix caching design doc sha256 now default ( #29261 )
...
Signed-off-by: redwrasse <mail@redwrasse.io>
2025-12-06 07:39:56 +00:00
Russell Bryant
3633035a3f
[Misc] Rename CohereForAI references to CohereLabs ( #30147 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-12-05 19:41:40 +00:00
Yanan Cao
62b3333448
[Frontend] Remove deprecated -O.xx flag ( #29991 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-12-05 00:47:22 -08:00
Tiger Xu / Zhonghu Xu
60a66ea2dc
[DOC]: Add kthena to integrations ( #29931 )
...
Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>
2025-12-05 08:11:03 +00:00
Hubert de La Jonquiere
befb59e5b1
[Model] Add Holo2 reasoning parser ( #30048 )
...
Signed-off-by: hdlj-h <hubert@hcompany.ai>
2025-12-05 10:38:45 +08:00
TimWang
690cc3ef20
docs: update metrics design doc to use new vllm:kv_cache_usage_perc ( #30041 )
...
Signed-off-by: Tim <tim.wang03@sap.com>
2025-12-04 23:37:14 +00:00
Tao Yun
6dcb07f676
support qwen3-vl handle requests with embeddings ( #30037 )
...
Signed-off-by: taoyun <1069423820@qq.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-04 17:34:06 +00:00
Shengqi Chen
990f806473
[Doc] clarify nightly builds in developer docs ( #30019 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-05 00:28:37 +08:00
Harry Mellor
9998ea5b57
Delete HF version of Phi 4 MM ( #30049 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 13:44:50 +00:00
wang.yuqi
74c4d80c6c
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling ( #27145 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
dtc
842aba501d
[P/D] Introduce Mooncake Transfer Engine as kv_connector ( #24718 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: dtc <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-12-04 09:51:36 +00:00
CYJiang
fd68e909db
[docs] Remove _total from counter metrics names ( #30028 )
...
In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API.
Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com>
2025-12-04 07:46:15 +00:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing ( #29970 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
bnellnm
2902c34826
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton ( #29929 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 20:49:00 +00:00
Lumis Chen
9bcf92295a
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching ( #29163 )
...
Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Lumis Chen <lumischen01@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-12-03 16:06:57 +00:00
ioana ghiban
1bb17ecb39
[CPU Backend] [Doc]: Update Installation Docs for CPUs ( #29868 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-03 13:33:50 +00:00
ioana ghiban
15b1511a15
[GPU Backend] [Doc]: Remove duplicate statements on missing GPU wheels. ( #29962 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-03 12:56:47 +00:00
Amr Mahdi
f5d3d93c40
[docker] Build CUDA kernels in separate Docker stage for faster rebuilds ( #29452 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2025-12-03 11:41:53 +00:00
Fadi Arafeh
78f4bb0ba8
[DOC] Add Arm to list of compute resouces providers ( #29894 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-03 11:36:58 +00:00
Russell Bryant
b08025a83b
[Docs] Discuss api key limitations in security guide ( #29922 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-12-02 20:57:28 -08:00
wang.yuqi
2eb4fe9129
[examples] Resettle pooling examples. ( #29365 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 15:54:28 +00:00