xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-24 02:47:11 +08:00

Author	SHA1	Message	Date
Mark McLoughlin	f17f1d4608	[V1][Metrics] Add GPU cache usage % gauge (#12561 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-29 18:31:01 -08:00
Mark McLoughlin	46fb056749	[V1][Metrics] Add TTFT and TPOT histograms (#12530 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-29 04:11:16 +00:00
Ce Gao	a7e3eba66f	[Frontend] Support reasoning content for deepseek r1 (#12473 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-01-29 11:38:08 +08:00
Mark McLoughlin	c386c43ca3	[V1][Metrics] Add per-request prompt/generation_tokens histograms (#12516 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-28 22:07:22 +00:00
Mark McLoughlin	3fd1fb63ef	[V1][Metrics] Hook up IterationStats for Prometheus metrics (#12478 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-28 16:38:38 +00:00
Mark McLoughlin	01ba927040	[V1][Metrics] Add initial Prometheus logger (#12416 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-27 12:26:28 -05:00
Pooya Davoodi	0cc6b383d7	[Frontend] Support scores endpoint in run_batch (#12430 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-01-27 04:30:17 +00:00
Kyle Mistele	0034b09ceb	[Frontend] Rerank API (Jina- and Cohere-compatible API) (#12376 ) Signed-off-by: Kyle Mistele <kyle@mistele.com>	2025-01-26 19:58:45 -07:00
Matthew Hendrey	9ddc35220b	[Frontend] generation_config.json for maximum tokens(#12242 ) Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com> Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-01-26 19:59:25 +08:00
Wallas Henrique	58fd57ff1d	[Bugfix] Fix score api for missing max_model_len validation (#12119 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2025-01-17 16:24:22 +00:00
youkaichao	87a0c076af	[core] allow callable in collective_rpc (#12151 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-17 20:47:01 +08:00
Jee Jee Li	07934cc237	[Misc][LoRA] Improve the readability of LoRA error messages (#12102 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-17 19:32:28 +08:00
Isotr0py	d75ab55f10	[Misc] Add deepseek_vl2 chat template (#12143 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-17 06:34:48 +00:00
Joe Runde	edce722eaa	[Bugfix] use right truncation for non-generative tasks (#12050 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-01-16 00:31:01 +08:00
Joe Runde	ac2f3f7fee	[Bugfix] Validate lora adapters to avoid crashing server (#11727 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-10 15:56:36 +08:00
Cyrus Leung	9a228348d2	[Misc] Provide correct Pixtral-HF chat template (#11891 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-09 10:19:37 -07:00
Maximilien de Bayser	1fe554bac3	treat do_lower_case in the same way as the sentence-transformers library (#11815 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-01-09 11:05:43 +08:00
Joe Runde	4db72e57f6	[Bugfix][Refactor] Unify model management in frontend (#11660 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-01-01 02:21:51 +00:00
Michael Goin	74fa1d123c	[Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-31 03:43:54 +00:00
Cyrus Leung	101418096f	[VLM] Support caching in merged multi-modal processor (#11396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-27 17:22:48 +00:00
Cyrus Leung	7af553ea30	[Misc] Abstract the logic for reading and writing media content (#11527 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-27 19:21:23 +08:00
Cyrus Leung	9edca6bf8f	[Frontend] Online Pooling API (#11457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 17:54:30 +08:00
Michael Goin	63afbe9215	[CI] Expand OpenAI test_chat.py guided decoding tests (#11048 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-23 18:35:38 +00:00
Michael Goin	5bfb30a529	[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-23 23:06:20 +08:00
Roger Wang	29c748930e	[CI] Fix flaky entrypoint tests (#11403 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-21 21:08:44 -08:00
Yanyi Liu	5aef49806d	[Feature] Add load generation config from model (#11164 ) Signed-off-by: liuyanyi <wolfsonliu@163.com> Signed-off-by: Yanyi Liu <wolfsonliu@163.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-12-19 10:50:38 +00:00
Michael Goin	a30482f054	[CI] Expand test_guided_generate to test all backends (#11313 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-19 04:00:38 +00:00
Michael Goin	c77eb8a33c	[Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264 )	2024-12-17 16:34:06 -08:00
Joe Runde	2d1b9baa8f	[Bugfix] Fix request cancellation without polling (#11190 )	2024-12-17 12:26:32 -08:00
kYLe	66d4b16724	[Frontend] Add OpenAI API support for input_audio (#11027 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-16 22:09:58 -08:00
Michael Goin	0064f697d3	[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-17 11:39:58 +08:00
youkaichao	551603feff	[core] overhaul memory profiling and fix backward compatibility (#10511 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 13:32:25 -08:00
Isotr0py	d927dbcd88	[Model] Refactor Ultravox to use merged input processor (#11198 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-16 10:09:53 +00:00
Brad Hilton	9c3dadd1c9	[Frontend] Add `logits_processors` as an extra completion argument (#11150 ) Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>	2024-12-14 16:46:42 +00:00
Cyrus Leung	0920ab9131	[Doc] Reorganize online pooling APIs (#11172 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 00:22:22 +08:00
Cyrus Leung	eeec9e3390	[Frontend] Separate pooling APIs in offline inference (#11129 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-13 10:40:07 +00:00
Jiaxin Shan	85362f028c	[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094 ) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-12 09:25:16 +00:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Isotr0py	a811dd6608	[Model] merged input processor for Phi-3-Vision models (#10977 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-09 12:55:10 -08:00
Michael Goin	8d370e91cb	[Bugfix] Fallback to outlines for complex json schemas (#10899 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-05 11:14:06 +08:00
Aaron Pham	9323a3153b	[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-12-03 15:17:00 +08:00
Cyrus Leung	d2f058e76c	[Misc] Rename embedding classes to pooling (#10801 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-01 14:36:51 +08:00
tomeras91	395b1c7454	[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server (#10635 ) Signed-off-by: Tomer Asida <tomera@ai21.com>	2024-11-27 13:21:10 -08:00
youkaichao	308cc5e21e	[ci] fix slow tests (#10698 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-27 09:26:14 -08:00
youkaichao	334d64d1e8	[ci] add vllm_test_utils (#10659 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-26 00:20:04 -08:00
Chauncey	d04b13a380	[Bug]: Authorization ignored when root_path is set (#10606 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2024-11-25 16:21:41 +00:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
Varun Vinayak Shenoy	7d8ffb344f	[Bugfix] Internal Server Error when tool_choice is incorrect. (#10567 ) Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>	2024-11-22 21:13:29 -08:00
Travis Johnson	9195dbdbca	[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use (#10164 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-11-23 10:17:38 +08:00
Chauncey	da7e702c6f	[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored (#10180 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2024-11-21 16:24:32 +00:00

1 2 3 4

199 Commits