From be0b399d7498c00a0d66eb6cee2a0fe3c9b2838f Mon Sep 17 00:00:00 2001 From: Harry Mellor <19981378+hmellor@users.noreply.github.com> Date: Sat, 8 Mar 2025 08:35:07 +0100 Subject: [PATCH] Add training doc signposting to TRL (#14439) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> --- docs/source/index.md | 8 ++++++++ docs/source/training/trl.md | 13 +++++++++++++ 2 files changed, 21 insertions(+) create mode 100644 docs/source/training/trl.md diff --git a/docs/source/index.md b/docs/source/index.md index 0bd8e12d088a..3db79456a4e4 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -100,6 +100,14 @@ features/compatibility_matrix % Details about running vLLM +:::{toctree} +:caption: Training +:maxdepth: 1 + +training/trl.md + +::: + :::{toctree} :caption: Inference and Serving :maxdepth: 1 diff --git a/docs/source/training/trl.md b/docs/source/training/trl.md new file mode 100644 index 000000000000..ebdf593dbde5 --- /dev/null +++ b/docs/source/training/trl.md @@ -0,0 +1,13 @@ +# Transformers Reinforcement Learning + +Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers. + +Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions! + +See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information. + +:::{seealso} +For more information on the `use_vllm` flag you can provide to the configs of these online methods, see: +- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm) +- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm) +:::