From 1b86bd8e183138236415cc798f1beb3357e4f5eb Mon Sep 17 00:00:00 2001
From: Michael Goin <mgoin64@gmail.com>
Date: Tue, 7 Oct 2025 16:59:41 -0400
Subject: [PATCH] Add more libraries to rlhf.md (#26374)

Signed-off-by: Michael Goin <mgoin64@gmail.com>
---
 docs/training/rlhf.md | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/docs/training/rlhf.md b/docs/training/rlhf.md
index d07f32b80448a..b207c9ed373b8 100644
--- a/docs/training/rlhf.md
+++ b/docs/training/rlhf.md
@@ -1,8 +1,19 @@
 # Reinforcement Learning from Human Feedback
 
-Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors.
+Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors. vLLM can be used to generate the completions for RLHF.
 
-vLLM can be used to generate the completions for RLHF. Some ways to do this include using libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [verl](https://github.com/volcengine/verl) and [unsloth](https://github.com/unslothai/unsloth).
+The following open-source RL libraries use vLLM for fast rollouts (sorted alphabetically and non-exhaustive):
+
+- [Cosmos-RL](https://github.com/nvidia-cosmos/cosmos-rl)
+- [NeMo-RL](https://github.com/NVIDIA-NeMo/RL)
+- [Open Instruct](https://github.com/allenai/open-instruct)
+- [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)
+- [PipelineRL](https://github.com/ServiceNow/PipelineRL)
+- [Prime-RL](https://github.com/PrimeIntellect-ai/prime-rl)
+- [SkyRL](https://github.com/NovaSky-AI/SkyRL)
+- [TRL](https://github.com/huggingface/trl)
+- [Unsloth](https://github.com/unslothai/unsloth)
+- [verl](https://github.com/volcengine/verl)
 
 See the following basic examples to get started if you don't want to use an existing library: