[Docs] Update features/disagg_prefill, add v1 examples and development (#22165)

Signed-off-by: David Chen <530634352@qq.com>
2025-12-09 05:34:55 +08:00 · 2025-08-07 15:59:23 +08:00 · 2025-08-07 15:59:23 +08:00 · 289b18e670
commit 289b18e670
parent 35171b1172
3 changed files with 25 additions and 0 deletions
--- a/docs/assets/features/disagg_prefill/high_level_design.png
+++ b/docs/assets/features/disagg_prefill/high_level_design.png
--- a/docs/assets/features/disagg_prefill/workflow.png
+++ b/docs/assets/features/disagg_prefill/workflow.png
--- a/docs/features/disagg_prefill.md
+++ b/docs/features/disagg_prefill.md
@ -19,6 +19,18 @@ Two main reasons:

 Please refer to <gh-file:examples/online_serving/disaggregated_prefill.sh> for the example usage of disaggregated prefilling.

+Now supports 5 types of connectors:
+
+- **SharedStorageConnector**: refer to <gh-file:examples/offline_inference/disaggregated-prefill-v1/run.sh> for the example usage of SharedStorageConnector disaggregated prefilling.
+- **LMCacheConnectorV1**: refer to <gh-file:examples/others/lmcache/disagg_prefill_lmcache_v1/disagg_example_nixl.sh> for the example usage of LMCacheConnectorV1 disaggregated prefilling which uses NIXL as the underlying KV transmission.
+- **NixlConnector**: refer to <gh-file:tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh> for the example usage of NixlConnector disaggregated prefilling which support fully async send/recv.
+- **P2pNcclConnector**: refer to <gh-file:examples/online_serving/disaggregated_serving_p2p_nccl_xpyd/disagg_example_p2p_nccl_xpyd.sh> for the example usage of P2pNcclConnector disaggregated prefilling.
+- **MultiConnector**: take advantage of the kv_connector_extra_config: dict[str, Any] already present in KVTransferConfig to stash all the connectors we want in an ordered list of kwargs.such as:
+
+  ```bash
+  --kv-transfer-config '{"kv_connector":"MultiConnector","kv_role":"kv_both","kv_connector_extra_config":{"connectors":[{"kv_connector":"NixlConnector","kv_role":"kv_both"},{"kv_connector":"SharedStorageConnector","kv_role":"kv_both","kv_connector_extra_config":{"shared_storage_path":"local_storage"}}]}}'
+  ```
+
 ## Benchmarks

 Please refer to <gh-file:benchmarks/disagg_benchmarks> for disaggregated prefilling benchmarks.
@ -48,6 +60,19 @@ The workflow of disaggregated prefilling is as follows:

 The `buffer` corresponds to `insert` API in LookupBuffer, and the `drop_select` corresponds to `drop_select` API in LookupBuffer.

+Now every process in vLLM will have a corresponding connector. Specifically, we have:
+
+- Scheduler connector: the connector that locates in the same process as the scheduler process. It schedules the KV cache transfer ops.
+- Worker connectors: the connectors that locate in the worker processes. They execute KV cache transfer ops.
+
+Here is a figure illustrating how the above 2 connectors are organized:
+
+![Disaggregated prefilling high level design](../assets/features/disagg_prefill/high_level_design.png)
+
+The figure below shows how the worker connector works with the attention module to achieve layer-by-layer KV cache store and load:
+
+![Disaggregated prefilling workflow](../assets/features/disagg_prefill/workflow.png)
+
 ## Third-party contributions

 Disaggregated prefilling is highly related to infrastructure, so vLLM relies on third-party connectors for production-level disaggregated prefilling (and vLLM team will actively review and merge new PRs for third-party connectors).