36 Commits

Author SHA1 Message Date
Alex Brooks
ead2110297
[Core][Bugfix] Fix Online MM Beam Search (#19688)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-06-19 17:18:07 +00:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Alex Brooks
321331b8ae
[Core] Add Lora Support to Beam Search (#18346)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-05-28 08:58:24 -07:00
Cyrus Leung
61e0a506a3
[Bugfix] Avoid repeatedly creating dummy data during engine startup (#17935)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-12 22:40:19 -07:00
Cyrus Leung
cb234955df
[Misc] Clean up input processing (#17582)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-02 08:11:53 -07:00
Gabriel Marinho
1c2bc7ead0
Truncation control for embedding models (#14776)
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-04-30 09:24:57 +08:00
Xihui Cang
1666e66443
Add "/server_info" endpoint in api_server to retrieve the vllm_config.  (#16572)
Signed-off-by: Xihui Cang <xihuicang@gmail.com>
2025-04-15 11:50:38 +00:00
Eric Tang
ddb94c2605
[core] Add tags parameter to wake_up() (#15500)
Signed-off-by: Eric <erictang000@gmail.com>
2025-04-02 01:59:27 -07:00
Nick Hill
da6ea29f7a
[V1] Avoid redundant input processing in n>1 case (#14985)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-20 22:24:10 -07:00
maobaolong
26dd972adb
[FEAT]Support reset prefix cache by specified device (#15003) 2025-03-19 10:54:41 -07:00
Jun Duan
74bc397b0a
[Core] Expose API endpoint /is_sleeping (#14312)
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
2025-03-15 06:28:14 -07:00
youkaichao
ba81163997
[core] add sleep and wake up endpoint and v1 support (#12987)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: cennn <2523403608@qq.com>
Co-authored-by: cennn <2523403608@qq.com>
2025-02-20 12:41:17 +08:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Cody Yu
7206ce4ce1
[Core] Support reset_prefix_cache (#12284) 2025-01-22 18:52:27 +00:00
Joe Runde
ac2f3f7fee
[Bugfix] Validate lora adapters to avoid crashing server (#11727)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-10 15:56:36 +08:00
Cyrus Leung
cad5c0a6ed
[Doc] Update docs to refer to pooling models (#11093)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 13:36:27 +00:00
Cyrus Leung
d2f058e76c
[Misc] Rename embedding classes to pooling (#10801)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-01 14:36:51 +08:00
Cyrus Leung
0b8bb86bf1
[1/N] Initial prototype for multi-modal processor (#10044)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-13 12:39:03 +00:00
Cyrus Leung
bbc3619dc8
[Core] Make encoder-decoder inputs a nested structure to be more composable (#9604)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-05 10:07:31 +08:00
Zhong Qishuai
ef7865b4f9
[Frontend] re-enable multi-modality input in the new beam search implementation (#9427)
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
2024-10-29 11:49:47 +00:00
Robert Shaw
feb92fbe4a
Fix beam search eos (#9627) 2024-10-28 06:59:37 +00:00
Nick Hill
1ffc8a7362
[BugFix] Typing fixes to RequestOutput.prompt and beam search (#9473) 2024-10-18 07:19:53 +00:00
Brendan Wong
4d31cd424b
[Frontend] merge beam search implementations (#9296) 2024-10-14 15:05:52 -07:00
Sebastian Schoennenbeck
35bd215168
[Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (#8965) 2024-10-01 09:58:06 +00:00
Cyrus Leung
3b00b9c26c
[Core] renamePromptInputs and inputs (#8876) 2024-09-26 20:35:15 -07:00
Simon Mo
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810) 2024-09-25 10:36:26 -07:00
Cyrus Leung
28e1299e60
rename PromptInputs and inputs with backward compatibility (#8760) 2024-09-25 09:36:47 -07:00
Simon Mo
3185fb0cca
Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (#8750) 2024-09-24 05:45:20 +00:00
Cyrus Leung
0057894ef7
[Core] Rename PromptInputs and inputs(#8673) 2024-09-20 19:00:54 -07:00
Alexander Matveev
7c7714d856
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
afeldman-nm
428dd1445e
[Core] Logprobs support in Multi-step (#7652) 2024-08-29 19:19:08 -07:00
William Lin
dd53c4b023
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-21 15:39:26 -07:00
Robert Shaw
f7e3b0c5aa
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-21 13:34:14 -04:00
Cyrus Leung
baaedfdb2d
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
Nick Hill
9a3f49ae07
[BugFix] Overhaul async request cancellation (#7111) 2024-08-07 13:21:41 +08:00
Robert Shaw
ed812a73fa
[ Frontend ] Multiprocessing for OpenAI Server with zeromq (#6883)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-02 18:27:28 -07:00