clark
|
c0b1443345
|
fix mypy
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:20:12 +08:00 |
|
clark
|
d35dace985
|
refactor zmq msg to object
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:20:12 +08:00 |
|
clark
|
912031ceb5
|
refactor disagg
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:20:12 +08:00 |
|
clark
|
4f13e89143
|
fix SIM105
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
b9a7dbe769
|
remove default socket address value
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
0cb2e05256
|
change log level and fix some comments
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
d6945ecdf0
|
change disagg_prefill example to use zmq
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
6c8fae82dd
|
run format
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:19 +08:00 |
|
clark
|
16ed827378
|
add benchmark shell
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:18:08 +08:00 |
|
clark
|
8fa9df7987
|
run format.sh
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
27c1afe88b
|
fix ThreadProxy
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
ee6607332e
|
create proxy sockets in the proxy function for thread safety
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
7fbf70db57
|
1. replace tpc:// with ipc:// \n 2. fix json response
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
2c31e4c3ea
|
Run yapf and ruff
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:57 +08:00 |
|
clark
|
187f112ccd
|
1. fix mypy issue
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:44 +08:00 |
|
clark
|
897db7b93d
|
Replace zmq.asyncio.Context().term() with zmq.asyncio.Context().destroy(linger=0) for immediate termination
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:44 +08:00 |
|
clark
|
6e1fba8a73
|
1. connect_parser set --prefill-addr and --decode-addr are required
2.To more accurately reflect its purpose, we will rename connect.py to disagg_connector.py.
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:44 +08:00 |
|
clark
|
bfde1688e7
|
add /v1/completions stream support
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:17:44 +08:00 |
|
clark
|
905424ed65
|
add identity url headers
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:15:42 +08:00 |
|
clark
|
5d20f389d6
|
add vllm connect cmd
Signed-off-by: clark <panf2333@gmail.com>
|
2025-03-21 08:15:42 +08:00 |
|
Woosuk Kwon
|
2b22290ce0
|
[V1] Add flag to disable cascade attention (#15243)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-20 15:24:16 -07:00 |
|
Jason
|
d8e82bc06d
|
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043)
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>
|
2025-03-20 10:01:02 -07:00 |
|
Richard Liu
|
a8f12a63fd
|
Fix env vars for running Ray distributed backend on GKE (#15166)
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-20 14:59:33 +00:00 |
|
Cyrus Leung
|
27261e40a6
|
[Bugfix] Multi-video inference on LLaVA-Onevision (#15082)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-03-20 14:10:45 +00:00 |
|
Wang Ran (汪然)
|
c607a2652b
|
Fixing Imprecise Type Annotations (#15192)
|
2025-03-20 01:19:55 -07:00 |
|
billishyahao
|
742369d35a
|
[Frontend][Bugfix] support prefill decode disaggregation on deepseek (#14824)
Signed-off-by: billishyahao <bill.he@amd.com>
Co-authored-by: Zhai Feiyue <80079571+ZhaiFeiyue@users.noreply.github.com>
|
2025-03-20 00:00:33 -07:00 |
|
Wang Ran (汪然)
|
bfe2fe0af4
|
typo: Update config.py (#15189)
|
2025-03-19 23:31:21 -07:00 |
|
Matt Ritter
|
a8652f4f0f
|
Enable CUDA graph support for llama 3.2 vision (#14917)
Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com>
|
2025-03-19 23:29:16 -07:00 |
|
Mickaël Seznec
|
a597a57595
|
[Attention] Flash Attention 3 - fp8 (#14570)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
|
2025-03-20 01:14:20 -04:00 |
|
Chauncey
|
ae65f3e237
|
[Misc]fixed disable these http request logs (#14754)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-19 21:53:40 -07:00 |
|
Russell Bryant
|
1f16b7fe74
|
[Core][V0] Add guidance backend for structured output (#14589)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <lohuynh@microsoft.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-19 21:33:51 -07:00 |
|
Nicolò Lucchesi
|
d8c6d7d6b5
|
[V1][TPU] Support V1 Sampler for ragged attention (#14227)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-19 21:00:39 -07:00 |
|
Cyrus Leung
|
ffa443afed
|
[Bugfix] Fix embedding assignment for InternVL-based models (#15086)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-20 03:40:13 +00:00 |
|
Nick Hill
|
c47aafa37c
|
[BugFix] Lazily import XgrammarBackend to avoid early cuda init (#15171)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 01:30:43 +00:00 |
|
Alexander Matveev
|
cfbca8a2f2
|
[V1] TPU - Tensor parallel MP support (#15059)
|
2025-03-20 00:55:18 +00:00 |
|
Nick Hill
|
22d33baca2
|
[FrontEnd][Perf] merge_async_iterators fast-path for single-prompt requests (#15150)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-19 21:04:41 +00:00 |
|
iefgnoix
|
b0e96aaebb
|
[V1][TPU] Change kv cache shape. (#15145)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-03-19 12:16:42 -07:00 |
|
Wang Ran (汪然)
|
8310e0b59b
|
simple bugfix: Update stats.py (#15139)
|
2025-03-19 18:26:27 +00:00 |
|
maobaolong
|
26dd972adb
|
[FEAT]Support reset prefix cache by specified device (#15003)
|
2025-03-19 10:54:41 -07:00 |
|
Alessandro Sangiorgi
|
374ee287d8
|
[Frontend] Remove custom_cache_manager (#13791)
Signed-off-by: fulvius31 <asangior@redhat.com>
|
2025-03-20 00:13:50 +08:00 |
|
Jan Kaniecki
|
8363cd093d
|
[Bugfix] Adjust mllama to regional compilation (#15112)
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
|
2025-03-19 07:57:25 -07:00 |
|
Cyrus Leung
|
3d446433ec
|
[Bugfix] Fix size calculation of processing cache (#15114)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 05:53:19 -07:00 |
|
Cyrus Leung
|
1fe0fd12d3
|
[Misc] Avoid unnecessary HF do_rescale warning when passing dummy data (#15107)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 03:42:31 -07:00 |
|
Roger Wang
|
dafb4e504a
|
[V1][Bugfix] Fix oracle for device checking (#15104)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-19 18:35:32 +08:00 |
|
Cyrus Leung
|
61f412187d
|
[Bugfix] Re-enable Gemma3 for V1 (#14980)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-18 23:58:22 -07:00 |
|
Woosuk Kwon
|
05ccd0aa35
|
[V1] Ensure using int64 for sampled token ids (#15065)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-18 23:52:19 -07:00 |
|
Cyrus Leung
|
f690372b68
|
[Core] Update dtype detection and defaults (#14858)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 13:49:33 +08:00 |
|
Brayden Zhong
|
8b3e94a357
|
[Model] Remove duplicated message check in Mistral chat completion request (#15069)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-03-19 05:09:32 +00:00 |
|
Julien Denize
|
437f9162d0
|
[Model] Pixtral: Remove layer instantiation duplication (#15053)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-03-19 10:34:03 +08:00 |
|
Cody Yu
|
4f065f12f5
|
[Misc][V1] Skip device checking if not available (#15061)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-18 19:33:43 -07:00 |
|