Siyuan Liu
|
b15fd2be2a
|
[Hardware][TPU] Add check for no additional graph compilation during runtime (#14710)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-03-21 03:05:28 +00:00 |
|
Woosuk Kwon
|
e588ac237c
|
Add an example for reproducibility (#15262)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-20 19:55:47 -07:00 |
|
Cody Yu
|
5df2da5b97
|
[Misc] Better RayExecutor and multiprocessing compatibility (#14705)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-20 19:27:46 -07:00 |
|
Woosuk Kwon
|
11b986b3fb
|
[Docs] Trim the latest news in README (#15261)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-20 19:24:21 -07:00 |
|
Chih-Chieh Yang
|
296f927f24
|
[Model] RE: Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies (#14857)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
|
2025-03-20 19:21:08 -07:00 |
|
Travis Johnson
|
0032903a5b
|
[Bugfix] detect alibi and revert to FA2 (#15231)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-03-20 19:20:16 -07:00 |
|
Hyesoo Yang
|
47195057e9
|
[V1][TPU] Speed up top-k on TPU by using torch.topk (#15242)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
|
2025-03-20 19:19:40 -07:00 |
|
Harry Mellor
|
6edbfa924d
|
Mention extra_body as a way top pass vLLM only parameters using the OpenAI client (#15240)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-20 19:18:36 -07:00 |
|
Isotr0py
|
1e508343e1
|
[Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation (#15200)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-20 19:18:04 -07:00 |
|
Sage Moore
|
2e0b4cfde0
|
[ROCM] Upgrade torch to 2.6 (#15244)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-03-20 19:17:33 -07:00 |
|
Jee Jee Li
|
10f55fe6c5
|
[Misc] Clean up the BitsAndBytes arguments (#15140)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-20 19:17:12 -07:00 |
|
Lu Fang
|
d3ccbd6350
|
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
|
2025-03-21 10:01:11 +08:00 |
|
Varun Sundar Rabindranath
|
0cfe7d386d
|
[CI/Build] LoRA : make add_lora_test safer (#15181)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-21 09:28:53 +08:00 |
|
Woosuk Kwon
|
0c6f5023c3
|
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 17:50:43 -07:00 |
|
Yu Chin Fabian Lim
|
06dd08256f
|
Enforce that TP > 1 is not supported for Mamba2 if Quantization is Enabled. (#14617)
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-03-21 00:44:37 +00:00 |
|
clark
|
b89d89f456
|
fix rebase
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:32:21 +08:00 |
|
clark
|
8355358fb3
|
add unlimited HWM
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:20:12 +08:00 |
|
clark
|
c0b1443345
|
fix mypy
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:20:12 +08:00 |
|
clark
|
d35dace985
|
refactor zmq msg to object
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:20:12 +08:00 |
|
clark
|
912031ceb5
|
refactor disagg
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:20:12 +08:00 |
|
clark
|
4f13e89143
|
fix SIM105
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
b9a7dbe769
|
remove default socket address value
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
0cb2e05256
|
change log level and fix some comments
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
d6945ecdf0
|
change disagg_prefill example to use zmq
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
298298f97d
|
remove invalid zmq benchmark code
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
6c8fae82dd
|
run format
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
16ed827378
|
add benchmark shell
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:08 +08:00 |
|
clark
|
8fa9df7987
|
run format.sh
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
27c1afe88b
|
fix ThreadProxy
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
ee6607332e
|
create proxy sockets in the proxy function for thread safety
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
7fbf70db57
|
1. replace tpc:// with ipc:// \n 2. fix json response
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
2c31e4c3ea
|
Run yapf and ruff
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
187f112ccd
|
1. fix mypy issue
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:44 +08:00 |
|
clark
|
897db7b93d
|
Replace zmq.asyncio.Context().term() with zmq.asyncio.Context().destroy(linger=0) for immediate termination
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:44 +08:00 |
|
clark
|
b7ffb43792
|
update disagg_connect test_request.py
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:44 +08:00 |
|
clark
|
6e1fba8a73
|
1. connect_parser set --prefill-addr and --decode-addr are required
2.To more accurately reflect its purpose, we will rename connect.py to disagg_connector.py.
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:44 +08:00 |
|
clark
|
bfde1688e7
|
add /v1/completions stream support
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:44 +08:00 |
|
clark
|
905424ed65
|
add identity url headers
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:15:42 +08:00 |
|
clark
|
5d20f389d6
|
add vllm connect cmd
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:15:42 +08:00 |
|
clark
|
2a0cb78016
|
add test py
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:15:42 +08:00 |
|
Woosuk Kwon
|
2b22290ce0
|
[V1] Add flag to disable cascade attention (#15243)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-20 15:24:16 -07:00 |
|
Jason
|
d8e82bc06d
|
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043)
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>
|
2025-03-20 10:01:02 -07:00 |
|
Chi Zhang
|
086b56824c
|
[ci] feat: make the test_torchrun_example run with tp=2, external_dp=2 (#15172)
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-21 00:30:04 +08:00 |
|
Harry Mellor
|
5a0905ba2a
|
Replace misc issues with link to forum (#15226)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-20 23:18:20 +08:00 |
|
Richard Liu
|
a8f12a63fd
|
Fix env vars for running Ray distributed backend on GKE (#15166)
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-20 14:59:33 +00:00 |
|
Harry Mellor
|
69ae2380c6
|
Add user forum to README (#15220)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-20 22:39:51 +08:00 |
|
Cyrus Leung
|
27261e40a6
|
[Bugfix] Multi-video inference on LLaVA-Onevision (#15082)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-03-20 14:10:45 +00:00 |
|
Quang-Linh LE
|
e3f813c33b
|
[macOS] Ugrade pytorch to 2.6.0 (#15129)
|
2025-03-20 01:22:40 -07:00 |
|
Wang Ran (汪然)
|
c607a2652b
|
Fixing Imprecise Type Annotations (#15192)
|
2025-03-20 01:19:55 -07:00 |
|
Kevin H. Luu
|
3d45e3d749
|
[release] Tag vllm-cpu with latest upon new version released (#15193)
|
2025-03-20 01:19:10 -07:00 |
|