From be0b399d7498c00a0d66eb6cee2a0fe3c9b2838f Mon Sep 17 00:00:00 2001
From: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Date: Sat, 8 Mar 2025 08:35:07 +0100
Subject: [PATCH] Add training doc signposting to TRL (#14439)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
---
 docs/source/index.md        |  8 ++++++++
 docs/source/training/trl.md | 13 +++++++++++++
 2 files changed, 21 insertions(+)
 create mode 100644 docs/source/training/trl.md

diff --git a/docs/source/index.md b/docs/source/index.md
index 0bd8e12d088a..3db79456a4e4 100644
--- a/docs/source/index.md
+++ b/docs/source/index.md
@@ -100,6 +100,14 @@ features/compatibility_matrix
 
 % Details about running vLLM
 
+:::{toctree}
+:caption: Training
+:maxdepth: 1
+
+training/trl.md
+
+:::
+
 :::{toctree}
 :caption: Inference and Serving
 :maxdepth: 1
diff --git a/docs/source/training/trl.md b/docs/source/training/trl.md
new file mode 100644
index 000000000000..ebdf593dbde5
--- /dev/null
+++ b/docs/source/training/trl.md
@@ -0,0 +1,13 @@
+# Transformers Reinforcement Learning
+
+Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
+
+Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
+
+See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information.
+
+:::{seealso}
+For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
+- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
+- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)
+:::