Add chat doc in quick start (#21213)

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-06-03 13:37:56 +08:00 · 2025-08-03 22:47:55 +08:00 · 2025-08-03 22:47:55 +08:00 · 83f7bbb318
commit 83f7bbb318
parent b5dfb94fa0
1 changed files with 37 additions and 0 deletions
--- a/docs/getting_started/quickstart.md
+++ b/docs/getting_started/quickstart.md
@ -98,6 +98,43 @@ for output in outputs:
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```
 !!! note
    The `llm.generate` method does not automatically apply the model's chat template to the input prompt. Therefore, if you are using an Instruct model or Chat model, you should manually apply the corresponding chat template to ensure the expected behavior. Alternatively, you can use the `llm.chat` method and pass a list of messages which have the same format as those passed to OpenAI's `client.chat.completions`:
    ??? code
        ```python
        # Using tokenizer to apply chat template
        from transformers import AutoTokenizer
        tokenizer = AutoTokenizer.from_pretrained("/path/to/chat_model")
        messages_list = [
            [{"role": "user", "content": prompt}]
            for prompt in prompts
        ]
        texts = tokenizer.apply_chat_template(
            messages_list,
            tokenize=False,
            add_generation_prompt=True,
        )
        # Generate outputs
        outputs = llm.generate(texts, sampling_params)
        # Print the outputs.
        for output in outputs:
            prompt = output.prompt
            generated_text = output.outputs[0].text
            print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
        # Using chat interface.
        outputs = llm.chat(messages_list, sampling_params)
        for idx, output in enumerate(outputs):
            prompt = prompts[idx]
            generated_text = output.outputs[0].text
            print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
        ```
 [](){ #quickstart-online }
 ## OpenAI-Compatible Server