Robert Shaw
e3b318216d
[ Bugfix ] Fix Prometheus Metrics With zeromq Frontend ( #7279 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-18 20:19:48 +00:00
Rui Qiao
bae888cb8e
[Bugfix] Clear engine reference in AsyncEngineRPCServer ( #7618 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-16 20:44:05 -07:00
fzyzcjy
ec724a725e
support tqdm in notebooks ( #7510 )
2024-08-16 09:17:50 -07:00
Gordon Wong
0e39a33c6d
[Bugfix][Hardware][AMD][Frontend] add quantization param to embedding checking method ( #7513 )
2024-08-16 10:05:18 -06:00
Nick Hill
9587b050fb
[Core] Use uvloop with zmq-decoupled front-end ( #7570 )
2024-08-15 22:48:07 -07:00
nunjunj
3b19e39dc5
Chat method for offline llm ( #5049 )
...
Co-authored-by: nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal>
Co-authored-by: nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal>
Co-authored-by: nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-08-15 19:41:34 -07:00
Grant Pinkert
f878c8feb0
[Feature]: Add OpenAI server prompt_logprobs support #6508 ( #7453 )
2024-08-16 02:38:08 +00:00
Michael Goin
9c8e2d1161
[Bugfix][Harmless] Fix float16 dtype for model_is_embedding ( #7566 )
2024-08-15 18:26:19 -07:00
jack
67d115db08
[Bugfix][Frontend] Disable embedding API for chat models ( #7504 )
...
Co-authored-by: jack <jack@alex>
2024-08-14 09:15:19 -07:00
youkaichao
33e5d7e6b6
[frontend] spawn engine process from api server process ( #7484 )
2024-08-13 15:40:17 -07:00
Peter Salas
00c3d68e45
[Frontend][Core] Add plumbing to support audio language models ( #7446 )
2024-08-13 17:39:33 +00:00
Andrew Wang
97a6be95ba
[Misc] improve logits processors logging message ( #7435 )
2024-08-13 02:29:34 +00:00
Rui Qiao
198d6a2898
[Core] Shut down aDAG workers with clean async llm engine exit ( #7224 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-12 17:57:16 -07:00
Pooya Davoodi
249b88228d
[Frontend] Support embeddings in the run_batch API ( #7132 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-09 09:48:21 -07:00
Cyrus Leung
7eb4a51c5f
[Core] Support serving encoder/decoder models ( #7258 )
2024-08-09 10:39:41 +08:00
Joe Runde
21b9c49aa3
[Frontend] Kill the server on engine death ( #6594 )
...
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-08-08 09:47:48 -07:00
Maximilien de Bayser
fde47d3bc2
[BugFix] Fix frontend multiprocessing hang ( #7217 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-08-07 18:09:36 +00:00
Robert Shaw
564985729a
[ BugFix ] Move zmq frontend to IPC instead of TCP ( #7222 )
2024-08-07 16:24:56 +00:00
Cyrus Leung
66d617e343
[Frontend] Gracefully handle missing chat template and fix CI failure ( #7238 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-08-07 09:12:05 +00:00
Nick Hill
9a3f49ae07
[BugFix] Overhaul async request cancellation ( #7111 )
2024-08-07 13:21:41 +08:00
afeldman-nm
fd95e026e0
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) ( #4942 )
...
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-06 16:51:47 -04:00
Aditya Paliwal
57f560aa23
[BugFix] Use args.trust_remote_code ( #7121 )
2024-08-05 09:26:14 -07:00
Nick Hill
003f8ee128
[BugFix] Use IP4 localhost form for zmq bind ( #7163 )
2024-08-05 08:41:03 -07:00
Cyrus Leung
cc08fc7225
[Frontend] Reapply "Factor out code for running uvicorn" ( #7095 )
2024-08-04 20:40:51 -07:00
Yihuan Bu
654bc5ca49
Support for guided decoding for offline LLM ( #6878 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 03:12:09 +00:00
Cyrus Leung
8c025fa703
[Frontend] Factor out chat message parsing ( #7055 )
2024-08-02 21:31:27 -07:00
Robert Shaw
ed812a73fa
[ Frontend ] Multiprocessing for OpenAI Server with zeromq ( #6883 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-02 18:27:28 -07:00
zifeitong
3c10591ef2
[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user ( #6954 )
2024-07-31 21:13:34 -07:00
Simon Mo
7eb0cb4a14
Revert "[Frontend] Factor out code for running uvicorn" ( #7012 )
...
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-07-31 16:34:26 -07:00
Fei
c0644cf9ce
[Bugfix] fix logit processor excceed vocab size issue ( #6927 )
2024-07-31 16:16:01 +08:00
Cyrus Leung
da1f7cc12a
[mypy] Enable following imports for some directories ( #6681 )
2024-07-31 10:38:03 +08:00
Nick Hill
9f69d8245a
[Frontend] New allowed_token_ids decoding request parameter ( #6753 )
2024-07-29 23:37:27 +00:00
Isotr0py
7cbd9ec7a9
[Model] Initialize support for InternVL2 series models ( #6514 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
Cyrus Leung
981b0d5673
[Frontend] Factor out code for running uvicorn ( #6828 )
2024-07-27 09:58:25 +08:00
Alphi
b75e314fff
[Bugfix] Add image placeholder for OpenAI Compatible Server of MiniCPM-V ( #6787 )
...
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-25 09:42:49 -07:00
Evan Z. Liu
5689e256ba
[Frontend] Represent tokens with identifiable strings ( #6626 )
2024-07-25 09:51:00 +08:00
Daniele
ee812580f7
[Frontend] split run_server into build_server and run_server ( #6740 )
2024-07-24 10:36:04 -07:00
LF Marques
545146349c
Adding f-string to validation error which is missing ( #6748 )
2024-07-24 08:55:53 -07:00
Yehoshua Cohen
58f53034ad
[Frontend] Add Usage data in each chunk for chat_serving. #6540 ( #6652 )
2024-07-23 11:41:55 -07:00
Roger Wang
22fa2e35cb
[VLM][Model] Support image input for Chameleon ( #6633 )
2024-07-22 23:50:48 -07:00
Jiaxin Shan
42c7f66a38
[Core] Support dynamically loading Lora adapter from HuggingFace ( #6234 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-07-22 15:42:40 -07:00
Cyrus Leung
739b61a348
[Frontend] Refactor prompt processing ( #4028 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
Cyrus Leung
d7f4178dd9
[Frontend] Move chat utils ( #6602 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-21 08:38:17 +08:00
Daniele
51f8aa90ad
[Bugfix][Frontend] remove duplicate init logger ( #6581 )
2024-07-19 10:16:27 -07:00
Cyrus Leung
6366efc67b
[Bugfix][Frontend] Fix missing /metrics endpoint ( #6463 )
2024-07-19 03:55:13 +00:00
Nick Hill
e2fbaee725
[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs ( #6227 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-18 15:13:30 +08:00
youkaichao
1c27d25fb5
[core][model] yet another cpu offload implementation ( #6496 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-17 20:54:35 -07:00
sasha0552
7a3d2a5b95
[Frontend] Support for chat completions input in the tokenize endpoint ( #5923 )
2024-07-16 20:18:09 +08:00
Joe
d92b3c5cde
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests ( #6419 )
2024-07-15 18:54:15 -07:00
zifeitong
b47008b4d2
[BugFix] BatchResponseData body should be optional ( #6345 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-15 04:06:09 +00:00