From 7a0a9da72bad7c649d4c74b363040e0744e5d37d Mon Sep 17 00:00:00 2001 From: Varun Sundar Rabindranath Date: Thu, 24 Apr 2025 23:17:22 -0400 Subject: [PATCH] [Doc] V1 : Update LoRA status (#17133) Signed-off-by: varun sundar rabindranath Co-authored-by: varun sundar rabindranath --- docs/source/getting_started/v1_user_guide.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/docs/source/getting_started/v1_user_guide.md b/docs/source/getting_started/v1_user_guide.md index a87484c3bb042..de90b8a7851e6 100644 --- a/docs/source/getting_started/v1_user_guide.md +++ b/docs/source/getting_started/v1_user_guide.md @@ -44,8 +44,8 @@ This living user guide outlines a few known **important changes and limitations* |-----------------|-----------------------------------------------------------------------------------| | **Prefix Caching** | 🚀 Optimized | | **Chunked Prefill** | 🚀 Optimized | +| **LoRA** | 🚀 Optimized | | **Logprobs Calculation** | 🟢 Functional | -| **LoRA** | 🟢 Functional ([PR #13096](https://github.com/vllm-project/vllm/pull/13096))| | **Multimodal Models** | 🟢 Functional | | **FP8 KV Cache** | 🟢 Functional on Hopper devices ([PR #15191](https://github.com/vllm-project/vllm/pull/15191))| | **Spec Decode** | 🚧 WIP ([PR #13933](https://github.com/vllm-project/vllm/pull/13933))| @@ -121,11 +121,6 @@ Although we have re-implemented and partially optimized many features and models These features are already supported in vLLM V1, but their optimization is still in progress. -- **LoRA**: LoRA is functionally working on vLLM V1 but its performance is - inferior to that of V0. The team is actively working on improving its - performance -(e.g., see [PR #13096](https://github.com/vllm-project/vllm/pull/13096)). - - **Spec Decode**: Currently, only ngram-based spec decode is supported in V1. There will be follow-up work to support other types of spec decode (e.g., see [PR #13933](https://github.com/vllm-project/vllm/pull/13933)). We will prioritize the support for Eagle, MTP compared to draft model based spec decode.