mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-14 07:05:01 +08:00
[Doc] Fix Markdown Pre-commit Error (#24670)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
This commit is contained in:
parent
404c85ca72
commit
4984a291d5
@ -37,7 +37,7 @@ It is assumed you have already implemented your model in vLLM according to the b
|
|||||||
- The `supported_languages` mapping is validated at init time.
|
- The `supported_languages` mapping is validated at init time.
|
||||||
- Set `supports_transcription_only=True` if the model should not serve text generation (eg Whisper).
|
- Set `supports_transcription_only=True` if the model should not serve text generation (eg Whisper).
|
||||||
|
|
||||||
- Provide an ASR configuration via [get_speech_to_text_config][vllm.model_executor.models.interfaces.SupportsTranscription.get_speech_to_text_config].
|
- Provide an ASR configuration via [get_speech_to_text_config][vllm.model_executor.models.interfaces.SupportsTranscription.get_speech_to_text_config].
|
||||||
This is for controlling general behavior of the API when serving your model:
|
This is for controlling general behavior of the API when serving your model:
|
||||||
|
|
||||||
??? code
|
??? code
|
||||||
@ -65,7 +65,7 @@ It is assumed you have already implemented your model in vLLM according to the b
|
|||||||
|
|
||||||
- Implement the prompt construction via [get_generation_prompt][vllm.model_executor.models.interfaces.SupportsTranscription.get_generation_prompt]. The server passes you the resampled waveform and task parameters; you return a valid [PromptType][vllm.inputs.data.PromptType]. There are two common patterns:
|
- Implement the prompt construction via [get_generation_prompt][vllm.model_executor.models.interfaces.SupportsTranscription.get_generation_prompt]. The server passes you the resampled waveform and task parameters; you return a valid [PromptType][vllm.inputs.data.PromptType]. There are two common patterns:
|
||||||
|
|
||||||
#### A. Multimodal LLM with audio embeddings (e.g., Voxtral, Gemma3n)
|
### A. Multimodal LLM with audio embeddings (e.g., Voxtral, Gemma3n)
|
||||||
|
|
||||||
Return a dict containing `multi_modal_data` with the audio, and either a `prompt` string or `prompt_token_ids`:
|
Return a dict containing `multi_modal_data` with the audio, and either a `prompt` string or `prompt_token_ids`:
|
||||||
|
|
||||||
@ -102,7 +102,7 @@ It is assumed you have already implemented your model in vLLM according to the b
|
|||||||
|
|
||||||
For further clarification on multi modal inputs, please refer to [Multi-Modal Inputs](../../features/multimodal_inputs.md).
|
For further clarification on multi modal inputs, please refer to [Multi-Modal Inputs](../../features/multimodal_inputs.md).
|
||||||
|
|
||||||
#### B. Encoder–decoder audio-only (e.g., Whisper)
|
### B. Encoder–decoder audio-only (e.g., Whisper)
|
||||||
|
|
||||||
Return a dict with separate `encoder_prompt` and `decoder_prompt` entries:
|
Return a dict with separate `encoder_prompt` and `decoder_prompt` entries:
|
||||||
|
|
||||||
@ -142,7 +142,6 @@ It is assumed you have already implemented your model in vLLM according to the b
|
|||||||
return cast(PromptType, prompt)
|
return cast(PromptType, prompt)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
- (Optional) Language validation via [validate_language][vllm.model_executor.models.interfaces.SupportsTranscription.validate_language]
|
- (Optional) Language validation via [validate_language][vllm.model_executor.models.interfaces.SupportsTranscription.validate_language]
|
||||||
|
|
||||||
If your model requires a language and you want a default, override this method (see Whisper):
|
If your model requires a language and you want a default, override this method (see Whisper):
|
||||||
@ -177,7 +176,6 @@ It is assumed you have already implemented your model in vLLM according to the b
|
|||||||
return int(audio_duration_s * stt_config.sample_rate // 320) # example
|
return int(audio_duration_s * stt_config.sample_rate // 320) # example
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## 2. Audio preprocessing and chunking
|
## 2. Audio preprocessing and chunking
|
||||||
|
|
||||||
The API server takes care of basic audio I/O and optional chunking before building prompts:
|
The API server takes care of basic audio I/O and optional chunking before building prompts:
|
||||||
@ -264,10 +262,11 @@ Once your model implements `SupportsTranscription`, you can test the endpoints (
|
|||||||
-F "model=$MODEL_ID" \
|
-F "model=$MODEL_ID" \
|
||||||
http://localhost:8000/v1/audio/translations
|
http://localhost:8000/v1/audio/translations
|
||||||
```
|
```
|
||||||
|
|
||||||
Or check out more examples in <gh-file:examples/online_serving>.
|
Or check out more examples in <gh-file:examples/online_serving>.
|
||||||
|
|
||||||
!!! note
|
!!! note
|
||||||
|
|
||||||
- If your model handles chunking internally (e.g., via its processor or encoder), set `min_energy_split_window_size=None` in the returned `SpeechToTextConfig` to disable server-side chunking.
|
- If your model handles chunking internally (e.g., via its processor or encoder), set `min_energy_split_window_size=None` in the returned `SpeechToTextConfig` to disable server-side chunking.
|
||||||
- Implementing `get_num_audio_tokens` improves accuracy of streaming usage metrics (`prompt_tokens`) without an extra forward pass.
|
- Implementing `get_num_audio_tokens` improves accuracy of streaming usage metrics (`prompt_tokens`) without an extra forward pass.
|
||||||
- For multilingual behavior, keep `supported_languages` aligned with actual model capabilities.
|
- For multilingual behavior, keep `supported_languages` aligned with actual model capabilities.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user