mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-15 09:45:01 +08:00
Add RLHF document (#14482)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
parent
7caff01a7b
commit
cfd0ae8234
@ -14,13 +14,14 @@ EXAMPLE_DOC_DIR = ROOT_DIR / "docs/source/getting_started/examples"
|
|||||||
def fix_case(text: str) -> str:
|
def fix_case(text: str) -> str:
|
||||||
subs = {
|
subs = {
|
||||||
"api": "API",
|
"api": "API",
|
||||||
"Cli": "CLI",
|
"cli": "CLI",
|
||||||
"cpu": "CPU",
|
"cpu": "CPU",
|
||||||
"llm": "LLM",
|
"llm": "LLM",
|
||||||
"tpu": "TPU",
|
"tpu": "TPU",
|
||||||
"aqlm": "AQLM",
|
"aqlm": "AQLM",
|
||||||
"gguf": "GGUF",
|
"gguf": "GGUF",
|
||||||
"lora": "LoRA",
|
"lora": "LoRA",
|
||||||
|
"rlhf": "RLHF",
|
||||||
"vllm": "vLLM",
|
"vllm": "vLLM",
|
||||||
"openai": "OpenAI",
|
"openai": "OpenAI",
|
||||||
"multilora": "MultiLoRA",
|
"multilora": "MultiLoRA",
|
||||||
|
|||||||
@ -105,6 +105,7 @@ features/compatibility_matrix
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
training/trl.md
|
training/trl.md
|
||||||
|
training/rlhf.md
|
||||||
|
|
||||||
:::
|
:::
|
||||||
|
|
||||||
|
|||||||
11
docs/source/training/rlhf.md
Normal file
11
docs/source/training/rlhf.md
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
# Reinforcement Learning from Human Feedback
|
||||||
|
|
||||||
|
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviours.
|
||||||
|
|
||||||
|
vLLM can be used to generate the completions for RLHF. The best way to do this is with libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [verl](https://github.com/volcengine/verl).
|
||||||
|
|
||||||
|
See the following basic examples to get started if you don't want to use an existing library:
|
||||||
|
|
||||||
|
- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html)
|
||||||
|
- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html)
|
||||||
|
- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html)
|
||||||
Loading…
x
Reference in New Issue
Block a user