mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-03-18 07:47:07 +08:00

History

[Bugfix] Merge MM embeddings by index instead of token IDs (#16229 )

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>

2025-09-27 08:15:12 +00:00

basic.md

[Docs] [V1] [Hybrid] Add new documentation re: contributing mamba-based models (#23824 )

2025-08-29 18:47:58 +00:00

multimodal.md

[Bugfix] Merge MM embeddings by index instead of token IDs (#16229 )

2025-09-27 08:15:12 +00:00

README.md

[Docs] fix invalid doc link (#25017 )

2025-09-16 20:53:23 -07:00

registration.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

tests.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

transcription.md

[Docs] Fix formatting of transcription doc (#24676 )

2025-09-11 11:18:06 -07:00

README.md

Summary

!!! important Many decoder language models can now be automatically loaded using the [Transformers backend][transformers-backend] without having to implement them in vLLM. See if vllm serve <model> works first!

vLLM models are specialized PyTorch models that take advantage of various features to optimize their performance.

The complexity of integrating a model into vLLM depends heavily on the model's architecture. The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM. However, this can be more complex for models that include new operators (e.g., a new attention mechanism).

Read through these pages for a step-by-step guide:

!!! tip If you are encountering issues while integrating your model into vLLM, feel free to open a GitHub issue or ask on our developer slack. We will be happy to help you out!