22 Commits

Author SHA1 Message Date
Cyrus Leung
9edca6bf8f
[Frontend] Online Pooling API (#11457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-24 17:54:30 +08:00
Cyrus Leung
8f10d5e393
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 01:28:00 -08:00
Cyrus Leung
32e46e000f
[Frontend] Automatic detection of chat content format from AST (#9919)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-16 13:35:40 +08:00
zifeitong
47db6ec831
[Frontend] Add per-request number of cached token stats (#10174) 2024-11-12 16:42:28 +00:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
Cyrus Leung
06386a64dd
[Frontend] Chat-based Embeddings API (#9759) 2024-11-01 08:13:35 +00:00
Jiaxin Shan
260d40b5ea
[Core] Support Lora lineage and base model metadata management (#6315) 2024-09-20 06:20:56 +00:00
youkaichao
f842a7aff1
[misc] remove engine_use_ray (#8126) 2024-09-11 18:23:36 -07:00
Pooya Davoodi
cea95dfb94
[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347) 2024-09-11 05:30:11 +00:00
Adam Lugowski
58fcc8545a
[Frontend] Add progress reporting to run_batch.py (#8060)
Co-authored-by: Adam Lugowski <adam.lugowski@parasail.io>
2024-09-09 11:16:37 -07:00
Pooya Davoodi
8da48e4d95
[Frontend] Publish Prometheus metrics in run_batch API (#7641) 2024-08-23 23:04:22 -07:00
Pooya Davoodi
6885fde317
[Bugfix] Fix run_batch logger (#7640) 2024-08-23 13:58:26 -07:00
Pooya Davoodi
249b88228d
[Frontend] Support embeddings in the run_batch API (#7132)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-09 09:48:21 -07:00
Cyrus Leung
739b61a348
[Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
zifeitong
ff9ddbceee
[Misc] Remove #4789 workaround left in vllm/entrypoints/openai/run_batch.py (#5756) 2024-06-22 03:33:12 +00:00
Michael Goin
8065a7e220
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718) 2024-06-20 17:00:13 -06:00
zifeitong
26e1188e51
[Fix] Use utf-8 encoding in entrypoints/openai/run_batch.py (#5606) 2024-06-17 23:16:10 +00:00
zifeitong
3ce2c050dd
[Fix] Correct OpenAI batch response format (#5554) 2024-06-15 16:57:54 -07:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
Cyrus Leung
03dccc886e
[Misc] Add vLLM version getter to utils (#5098) 2024-06-13 11:21:39 -07:00
Alex Wu
5e0391c040
[Frontend] Separate OpenAI Batch Runner usage from API Server (#4851) 2024-05-17 00:42:41 +09:00
Alex Wu
52f8107cf2
[Frontend] Support OpenAI batch file format (#4794)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-05-15 19:13:36 -04:00