182 Commits

Author SHA1 Message Date
Harry Mellor
bc1d02ac85
[Docs] Add comprehensive CLI reference for all large vllm subcommands (#22601)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-11 00:13:33 -07:00
Nicolò Lucchesi
5a16fa614c
[Model] Gemma3n MM (#20495)
Signed-off-by: ShriKode <shrikode@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: ShriKode <shrikode@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-09 09:56:25 -07:00
Thomas Parnell
8a0ffd6285
Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation (#22541)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-08 23:05:32 -07:00
Kunshang Ji
81c57f60a2
[XPU] upgrade torch 2.8 on for XPU (#22300)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-08 17:03:45 -07:00
rongfu.leng
904063907c
[Misc] fix openai version (#22485)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-08-08 01:12:54 -07:00
Harry Mellor
434d2f3f7a
[Docs] Add missing dependency for docs build (#22435)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-07 00:22:07 -07:00
Harry Mellor
ecbea55ca2
Update hf_xet pin to resolve hangs (#22356)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-06 20:31:41 -07:00
Woosuk Kwon
134a8ee8fd
[gpt-oss] Add openai-harmony as default dependency (#22332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
2025-08-06 00:10:14 -07:00
Harry Mellor
796bae07c5
Update transformers to v4.55 (#21931)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-05 22:56:14 -07:00
Woosuk Kwon
dd16bdc798
Increase openai-python version (#22316)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-05 21:43:21 -07:00
Rui Qiao
2ff46b8826
[Misc] Bump ray to 2.48.0 (#22123)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-08-02 19:42:00 -07:00
Woosuk Kwon
0f81b310db
[Misc] Remove upper bound in openai package version (#22060)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-01 02:11:40 -07:00
Simon Mo
da31f6ad3d
Revert precompile wheel changes (#22055) 2025-08-01 08:26:24 +00:00
Michael Goin
0bd409cf01
Move flashinfer-python to optional extra vllm[flashinfer] (#21959)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-31 18:02:11 -07:00
XiongfeiWei
53c21e492e
Update torch_xla pin to 20250730 (#21956)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-31 17:26:43 +00:00
Doug Smith
58bb902186
fix(setup): improve precompiled wheel setup for Docker builds (#22025)
Signed-off-by: dougbtv <dosmith@redhat.com>
2025-07-31 09:52:48 -07:00
Harry Mellor
88edf5994c
[Docs] Reduce the size of the built docs (#21920)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-30 07:35:08 -07:00
Michael Goin
a33ea28b1b
Add flashinfer_python to CUDA wheel requirements (#21389)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-29 12:51:58 -07:00
Isotr0py
31084b3b1f
[Bugfix][CI/Build] Update peft version in test requirement (#21729)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-28 06:17:43 -07:00
Harry Mellor
1395dd9c28
[Docs] Add revision date to rendered docs (#21752)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-28 06:12:46 -07:00
Chengji Yao
f1b286b2fb
[TPU] Update ptxla nightly version to 20250724 (#21555)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-25 17:09:00 -07:00
Kebe
396ee94180
[CI] Unifying Dockerfiles for ARM and X86 Builds (#21343)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-07-25 07:33:56 -07:00
Juncheng Gu
6066284914
[P/D] Support CPU Transfer in NixlConnector (#18293)
Signed-off-by: Juncheng Gu <juncgu@gmail.com>
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Richard Liu <39319471+richardsliu@users.noreply.github.com>
Co-authored-by: Richard Liu <ricliu@google.com>
2025-07-24 17:58:42 +01:00
elvischenv
5a19a6c670
[Fix] Update mamba_ssm to 2.2.5 (#21421)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-07-24 03:25:41 -07:00
Chauncey
6da0078523
[Feat] Allow custom naming of vLLM processes (#21445)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-24 03:15:23 -07:00
Julien Denize
6d8d0a24c0
Add think chunk (#21333)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
2025-07-23 21:51:32 -07:00
Christian Pinto
8560a5b258
[Core][Model] PrithviMAE Enablement on vLLM v1 engine (#20577)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
2025-07-23 11:00:23 -07:00
Li, Jiang
e3a0e43d7f
[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code (#21032)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-19 05:13:55 -07:00
Woosuk Kwon
4de7146351
[V0 deprecation] Remove V0 HPU backend (#21131)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-17 16:37:36 -07:00
kYLe
4ef00b5cac
[VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-17 03:07:55 -07:00
XiongfeiWei
58760e12b1
[TPU] Start using python 3.12 (#21000)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-16 19:37:44 -07:00
Michael Goin
4e7dfbe7b4
Update PyTorch to torch==2.7.1 for CUDA (#21011)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-17 02:30:44 +00:00
Chauncey
b5c3b68359
[Misc] bump xgrammar version to v0.1.21 (#20992)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-15 19:42:16 -07:00
Harry Mellor
b637e9dcb8
Add full serve CLI reference back to docs (#20978)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 17:42:30 +00:00
Patrick von Platen
e7e3e6d263
Voxtral (#20970)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-15 07:35:30 -07:00
Reid
37e2ecace2
feat: add image zoom to improve image viewing experience (#20763)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-14 20:14:23 -07:00
XiongfeiWei
d4170fad39
Use w8a8 quantized matmul Pallas kernel (#19170)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-15 03:06:33 +00:00
22quinn
f326ab9c88
[Bugfix] Bump up mistral_common to support v13 tokenizer (#20905)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-14 10:45:03 +00:00
Maroon Ayoub
66f6fbd393
[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
2025-07-14 02:45:31 +00:00
Woosuk Kwon
f45a332886
[Sched] Enhance the logic to remove stopped requests from queues (#20739) 2025-07-12 15:33:13 -07:00
Isotr0py
01cae37713
[CI/Build] Ensure compatability with Transformers v4.53 (#20541)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-11 20:53:07 -07:00
Luka Govedič
762be26a8e
[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
2025-07-11 00:15:22 -07:00
Nathan Hoos
d6902ce79f
[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975)
Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>
2025-07-10 15:30:26 -04:00
Harry Mellor
3482fd7e4e
[Doc] Add engine args back in to the docs (#20674)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-10 08:02:40 -07:00
Jacob Manning
bf03ff3575
[Kernel] Add Conch backend for mixed-precision linear layer (#19818)
Signed-off-by: Jacob Manning <jmanning+oss@stackav.com>
2025-07-09 13:17:55 -07:00
XiongfeiWei
849590a2a7
Update torch/xla pin to 20250703 (#20589)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-08 07:44:02 -07:00
Sanger Steel
72d14d0eed
[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load (#19619)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: Eta <esyra@coreweave.com>
2025-07-07 22:47:43 -07:00
Jee Jee Li
4ff79a136e
[Misc] Set the minimum openai version (#20539)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-07 09:15:26 +00:00
Peter Pan
5561681d04
[CI] add kvcache-connector dependency definition and add into CI build (#18193)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-04 06:49:18 -07:00
Nicolò Lucchesi
d1b689c445
[Bugfix] Fix flaky test_streaming_response test (#20363)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-03 14:46:24 +00:00