5386 Commits

Author SHA1 Message Date
Siyuan Liu
b15fd2be2a
[Hardware][TPU] Add check for no additional graph compilation during runtime (#14710)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-03-21 03:05:28 +00:00
Woosuk Kwon
e588ac237c
Add an example for reproducibility (#15262)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-20 19:55:47 -07:00
Cody Yu
5df2da5b97
[Misc] Better RayExecutor and multiprocessing compatibility (#14705)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-20 19:27:46 -07:00
Woosuk Kwon
11b986b3fb
[Docs] Trim the latest news in README (#15261)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-20 19:24:21 -07:00
Chih-Chieh Yang
296f927f24
[Model] RE: Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies (#14857)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-03-20 19:21:08 -07:00
Travis Johnson
0032903a5b
[Bugfix] detect alibi and revert to FA2 (#15231)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2025-03-20 19:20:16 -07:00
Hyesoo Yang
47195057e9
[V1][TPU] Speed up top-k on TPU by using torch.topk (#15242)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
2025-03-20 19:19:40 -07:00
Harry Mellor
6edbfa924d
Mention extra_body as a way top pass vLLM only parameters using the OpenAI client (#15240)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-20 19:18:36 -07:00
Isotr0py
1e508343e1
[Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation (#15200)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-20 19:18:04 -07:00
Sage Moore
2e0b4cfde0
[ROCM] Upgrade torch to 2.6 (#15244)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-03-20 19:17:33 -07:00
Jee Jee Li
10f55fe6c5
[Misc] Clean up the BitsAndBytes arguments (#15140)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-20 19:17:12 -07:00
Lu Fang
d3ccbd6350
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
2025-03-21 10:01:11 +08:00
Varun Sundar Rabindranath
0cfe7d386d
[CI/Build] LoRA : make add_lora_test safer (#15181)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-21 09:28:53 +08:00
Woosuk Kwon
0c6f5023c3
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-20 17:50:43 -07:00
Yu Chin Fabian Lim
06dd08256f
Enforce that TP > 1 is not supported for Mamba2 if Quantization is Enabled. (#14617)
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
2025-03-21 00:44:37 +00:00
clark
b89d89f456 fix rebase
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:32:21 +08:00
clark
8355358fb3 add unlimited HWM
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:20:12 +08:00
clark
c0b1443345 fix mypy
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:20:12 +08:00
clark
d35dace985 refactor zmq msg to object
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:20:12 +08:00
clark
912031ceb5 refactor disagg
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:20:12 +08:00
clark
4f13e89143 fix SIM105
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:18:19 +08:00
clark
b9a7dbe769 remove default socket address value
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:18:19 +08:00
clark
0cb2e05256 change log level and fix some comments
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:18:19 +08:00
clark
d6945ecdf0 change disagg_prefill example to use zmq
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:18:19 +08:00
clark
298298f97d remove invalid zmq benchmark code
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:18:19 +08:00
clark
6c8fae82dd run format
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:18:19 +08:00
clark
16ed827378 add benchmark shell
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:18:08 +08:00
clark
8fa9df7987 run format.sh
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:57 +08:00
clark
27c1afe88b fix ThreadProxy
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:57 +08:00
clark
ee6607332e create proxy sockets in the proxy function for thread safety
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:57 +08:00
clark
7fbf70db57 1. replace tpc:// with ipc:// \n 2. fix json response
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:57 +08:00
clark
2c31e4c3ea Run yapf and ruff
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:57 +08:00
clark
187f112ccd 1. fix mypy issue
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:44 +08:00
clark
897db7b93d Replace zmq.asyncio.Context().term() with zmq.asyncio.Context().destroy(linger=0) for immediate termination
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:44 +08:00
clark
b7ffb43792 update disagg_connect test_request.py
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:44 +08:00
clark
6e1fba8a73 1. connect_parser set --prefill-addr and --decode-addr are required
2.To more accurately reflect its purpose, we will rename connect.py to disagg_connector.py.

Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:44 +08:00
clark
bfde1688e7 add /v1/completions stream support
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:17:44 +08:00
clark
905424ed65 add identity url headers
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:15:42 +08:00
clark
5d20f389d6 add vllm connect cmd
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:15:42 +08:00
clark
2a0cb78016 add test py
Signed-off-by: clark <panf2333@gmail.com>
2025-03-21 08:15:42 +08:00
Woosuk Kwon
2b22290ce0
[V1] Add flag to disable cascade attention (#15243)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-20 15:24:16 -07:00
Jason
d8e82bc06d
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043)
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>
2025-03-20 10:01:02 -07:00
Chi Zhang
086b56824c
[ci] feat: make the test_torchrun_example run with tp=2, external_dp=2 (#15172)
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-03-21 00:30:04 +08:00
Harry Mellor
5a0905ba2a
Replace misc issues with link to forum (#15226)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-20 23:18:20 +08:00
Richard Liu
a8f12a63fd
Fix env vars for running Ray distributed backend on GKE (#15166)
Signed-off-by: Richard Liu <ricliu@google.com>
2025-03-20 14:59:33 +00:00
Harry Mellor
69ae2380c6
Add user forum to README (#15220)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-20 22:39:51 +08:00
Cyrus Leung
27261e40a6
[Bugfix] Multi-video inference on LLaVA-Onevision (#15082)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-03-20 14:10:45 +00:00
Quang-Linh LE
e3f813c33b
[macOS] Ugrade pytorch to 2.6.0 (#15129) 2025-03-20 01:22:40 -07:00
Wang Ran (汪然)
c607a2652b
Fixing Imprecise Type Annotations (#15192) 2025-03-20 01:19:55 -07:00
Kevin H. Luu
3d45e3d749
[release] Tag vllm-cpu with latest upon new version released (#15193) 2025-03-20 01:19:10 -07:00