Iceber Gu
33a3d6c798
fix LoRA-related examples ( #29956 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
2025-12-04 11:48:30 +08:00
Yu Jiaqi
9ae3c55b10
SigLIP example add chat_template ( #29902 )
...
Signed-off-by: piood <2477084691@qq.com>
2025-12-03 16:12:58 +00:00
Isotr0py
a21cd9ed23
[Bugfix] Fix incorrect image_grid_thw rank for HunyuanOCR from missing merge_by_field_config=True ( #29950 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 10:05:10 +00:00
Harry Mellor
6fc5841db1
Fix some more Transformers nightly tests ( #29872 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 21:49:44 +00:00
wang.yuqi
2eb4fe9129
[examples] Resettle pooling examples. ( #29365 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 15:54:28 +00:00
Zhuohan Li
d0cd728907
[Core] Support reseting all running requests' KV while calling reset_prefix_cache ( #28827 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-12-02 02:25:05 +00:00
daniel-salib
014ece97c7
[Frontend] Add tool filtering support to ToolServer ( #29224 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-01 08:03:57 +00:00
Jee Jee Li
39e63dec7c
[LoRA] Cleanup LoRA unused code ( #29611 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 22:52:58 -08:00
Eldar Kurtić
c32a18cbe7
Attempt to fix GPU OOM in a spec-decoding test ( #29419 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
2025-11-25 14:23:36 -05:00
Fanli Lin
a21256c463
Add TP CLI argument to multimodal inference examples ( #29301 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-11-25 06:03:20 +00:00
Isotr0py
92effb07a4
[Model] Add HunyuanOCR support ( #29327 )
...
Signed-off-by: manayang <jackmanayang@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: sergeywang <sergeywang@tencent.com>
Co-authored-by: manayang <jackmanayang@gmail.com>
Co-authored-by: manayang <manayang@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-25 03:28:51 +00:00
Chenheli Hua
839c6b7b72
[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. ( #27721 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-24 19:24:37 +00:00
Fanli Lin
ed40d85929
[BugFix] Fix R-VL model loading error ( #29299 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-11-23 22:48:45 -08:00
Roger Wang
0ff70821c9
[Core] Deprecate xformers ( #29262 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-24 04:18:55 +00:00
Josh Moore
c309bb5245
[Bugfix] Update Gradio OpenAI Chatbot Webserver example to new Gradio message history format ( #29249 )
...
Signed-off-by: joshiemoore <joshiemoore98@gmail.com>
2025-11-24 00:47:54 +00:00
Cyrus Leung
389aa1b2eb
[Doc] Update more docs with respect to V1 ( #29188 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-23 10:58:48 +08:00
Andrew Xia
742e9ff6b3
[responsesAPI] parse reasoning item input ( #28248 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-22 15:42:11 +08:00
Cyrus Leung
aab0102a26
[V0 deprecation] Remove more V0 references ( #29088 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:56:59 +00:00
Rob Mulla
dd39f91edb
[Doc] cleanup TPU documentation and remove outdated examples ( #29048 )
...
Signed-off-by: Rob Mulla <rob.mulla@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-21 00:05:59 +00:00
Pan Li
e5bfcb6a88
[BugFix][PD]: make example proxy usable with P2pNcclConnector ( #26628 )
...
Signed-off-by: PAN <1162953505@qq.com>
2025-11-20 17:38:31 +00:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 ( #28542 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
Didier Durand
7ed27f3cb5
[Doc]: fix typos in various files ( #28945 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-18 22:52:30 -08:00
Jerry Zhang
da94c7c0eb
Move online quantization to model.load_weights ( #26327 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-11-18 16:52:41 -08:00
Kevin H. Luu
c64c0b78de
[chore] Move the rest of wikimedia url to S3 ( #28921 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 09:44:18 -08:00
Julien Denize
085424808e
Remove audio optional dependency for mistral-common ( #28722 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 09:54:38 -08:00
Nicolò Lucchesi
6f1e7f7226
[DisaggEverything] Tokens in<>out /generate endpoint ( #24261 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 09:58:01 -07:00
Fanli Lin
c9fe6abe7c
[Bugfix] Fix FPS value type for Qwen2.5-Omni video processing ( #28630 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-11-13 13:06:06 +00:00
Chauncey
5c9ad138d5
[Frontend] supports interleaved thinking ( #28531 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-13 16:14:13 +08:00
Chenguang Zheng
4ccffe561f
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation ( #25233 )
...
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com>
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com>
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com>
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>
2025-11-11 18:58:33 -08:00
wangxiyuan
e1710393c4
[[V0 deprecation]]Remove VLLM_USE_V1 env ( #28204 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-11 18:22:16 -07:00
Ning Xie
e5e9067e61
[Misc] fix typo and add detailed log ( #28178 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-11-09 05:33:46 +00:00
Kevin H. Luu
05f8d69077
[chore] Move some wikimedia images to S3 ( #28351 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-11-09 01:58:26 +00:00
Ev Lacey
77d702a22b
Enhance run_cluster.sh for multi-NIC support ( #28328 )
...
Signed-off-by: Ev Lacey <elacey@nvidia.com>
2025-11-08 22:04:16 +00:00
Harry Mellor
d9ab1ad9d1
reasoning_content -> reasoning (#27752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-08 12:15:08 +00:00
Fang Han
da855b42d2
[Doc]: Make extraInit containers fully configurable in helm chart ( #27497 )
...
Signed-off-by: Fang Han <fhan0520@gmail.com>
2025-11-06 20:27:16 +00:00
Chauncey
59a50afa08
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony ( #26874 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-06 10:40:03 +00:00
Isotr0py
3f5a4b6473
[Bugfix] Validate custom logits processor xargs for online serving ( #27560 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-05 16:53:33 +00:00
zhang-prog
40b69e33e7
[Model] Add PaddleOCR-VL Model Support ( #27758 )
...
Signed-off-by: zhangyue <zhangyue66@baidu.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-03 19:04:22 +08:00
wang.yuqi
4464723f22
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. ( #25524 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-30 12:13:05 +00:00
22quinn
f7a6682872
[CI/Build] Test torchrun with 8 cards ( #27548 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-10-29 10:26:06 -07:00
Yeshwanth N
71b1c8b667
[Chore]:Extract math and argparse utilities to separate modules ( #27188 )
...
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
2025-10-26 04:03:32 -07:00
Yu Jiaqi
0552cfb195
[Model] Siglip Embedding Support ( #27324 )
...
Signed-off-by: piood <2477084691@qq.com>
2025-10-23 20:19:48 +00:00
wang.yuqi
3fa2c12185
[Frontend][4/N] Improve all pooling task | Add plugin pooling task ( #26973 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
2025-10-23 14:46:18 +00:00
fangpings
7e0941055f
[Bugfix] Fix incorrect kv cache metrics in grafana.json ( #27133 )
...
Signed-off-by: Fangping Shi <fangping_shi@apple.com>
Co-authored-by: Fangping Shi <fangping_shi@apple.com>
2025-10-22 20:58:36 -07:00
Isotr0py
2566dca2a9
[Bugfix] Fix deepseek-ocr multi-image inference and add merge_by_field_config=True with tensor schema support ( #27361 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-22 17:15:38 -07:00
Luciano Martins
e05a6754a8
[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… ( #27309 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
2025-10-22 10:05:34 -07:00
Russell Bryant
58fab50d82
[Frontend] Require flag for loading text and image embeds ( #27204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-22 15:52:02 +00:00
Isotr0py
675aa2ec64
[Model] Upstream Deepseek-OCR model ( #27247 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-22 07:59:15 -07:00
wang.yuqi
1f633b8632
[Frontend][3/N] Improve all pooling task | Support binary embedding response ( #27066 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-22 18:38:57 +08:00
Yi Zhang
f32bf7582e
[Model][VLM] Support Bee-8B Model ( #27012 )
...
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com>
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-20 02:31:26 +00:00