mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-31 02:21:20 +08:00

History

[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node (#26338 )

Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

2025-10-31 10:16:00 -07:00

quantization

[Docs] Reduce custom syntax used in docs (#27009 )

2025-10-16 20:05:34 -07:00

automatic_prefix_caching.md

[Docs] Reduce custom syntax used in docs (#27009 )

2025-10-16 20:05:34 -07:00

custom_arguments.md

[V1] Logits processor docs (#22919 )

2025-09-17 11:53:12 -07:00

custom_logitsprocs.md

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

disagg_prefill.md

[Docs] Reduce custom syntax used in docs (#27009 )

2025-10-16 20:05:34 -07:00

lora.md

[Docs] Reduce custom syntax used in docs (#27009 )

2025-10-16 20:05:34 -07:00

multimodal_inputs.md

[Frontend] Require flag for loading text and image embeds (#27204 )

2025-10-22 15:52:02 +00:00

nixl_connector_usage.md

[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node (#26338 )

2025-10-31 10:16:00 -07:00

prompt_embeds.md

[Frontend] Require flag for loading text and image embeds (#27204 )

2025-10-22 15:52:02 +00:00

README.md

[CORE] Support Prefix Caching with Prompt Embeds (#27219 )

2025-10-22 22:18:07 -07:00

reasoning_outputs.md

[Doc] Slight improvement to M2 and beyond (#27554 )

2025-10-27 09:02:10 +00:00

sleep_mode.md

[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967 )

2025-10-02 10:04:57 -07:00

spec_decode.md

[Docs] Reduce custom syntax used in docs (#27009 )

2025-10-16 20:05:34 -07:00

structured_outputs.md

[Docs] Reduce custom syntax used in docs (#27009 )

2025-10-16 20:05:34 -07:00

tool_calling.md

[Docs] reemove the incorrect enable_reasoning parameter (#27550 )

2025-10-26 23:17:19 -07:00

README.md

Features

Compatibility Matrix

The tables below show mutually exclusive features and the support on some hardware.

The symbols used have the following meanings:

✅ = Full compatibility
🟠 = Partial compatibility
❌ = No compatibility
❔ = Unknown or TBD

!!! note Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/hardware combination.

Feature x Feature

Feature	CP	APC	LoRA	SD	CUDA graph	pooling	enc-dec	logP	prmpt logP	async output	multi-step	mm	best-of	beam-search	prompt-embeds
CP	✅
APC	✅	✅
LoRA	✅	✅	✅
SD	✅	✅	❌	✅
CUDA graph	✅	✅	✅	✅	✅
pooling	🟠*	🟠*	✅	❌	✅	✅
enc-dec	❌	❌	❌	❌	✅	✅	✅
logP	✅	✅	✅	✅	✅	❌	✅	✅
prmpt logP	✅	✅	✅	✅	✅	❌	✅	✅	✅
async output	✅	✅	✅	❌	✅	❌	❌	✅	✅	✅
multi-step	❌	✅	❌	❌	✅	❌	❌	✅	✅	✅	✅
mm	✅	✅	🟠^{^}	❔	✅	✅	✅	✅	✅	✅	❔	✅
best-of	✅	✅	✅	❌	✅	❌	✅	✅	✅	❔	❌	✅	✅
beam-search	✅	✅	✅	❌	✅	❌	✅	✅	✅	❔	❌	❔	✅	✅
prompt-embeds	✅	✅	✅	❌	✅	❌	❌	✅	❌	❔	❔	❌	❔	❔	✅

* Chunked prefill and prefix caching are only applicable to last-token pooling.
^{^} LoRA is only applicable to the language backbone of multimodal models.

Feature x Hardware

Feature	Volta	Turing	Ampere	Ada	Hopper	CPU	AMD	TPU	Intel GPU
CP	❌	✅	✅	✅	✅	✅	✅	✅	✅
APC	❌	✅	✅	✅	✅	✅	✅	✅	✅
LoRA	✅	✅	✅	✅	✅	✅	✅	✅	✅
SD	✅	✅	✅	✅	✅	❌	✅	❌	🟠
CUDA graph	✅	✅	✅	✅	✅	❌	✅	❌	❌
pooling	✅	✅	✅	✅	✅	✅	✅	❌	✅
enc-dec	✅	✅	✅	✅	✅	✅	❌	❌	✅
mm	✅	✅	✅	✅	✅	✅	✅	❌	🟠
logP	✅	✅	✅	✅	✅	✅	✅	❌	✅
prmpt logP	✅	✅	✅	✅	✅	✅	✅	❌	✅
async output	✅	✅	✅	✅	✅	❌	❌	❌	✅
multi-step	✅	✅	✅	✅	✅	❌	✅	❌	✅
best-of	✅	✅	✅	✅	✅	✅	✅	❌	✅
beam-search	✅	✅	✅	✅	✅	✅	✅	❌	✅
prompt-embeds	✅	✅	✅	✅	✅	✅	❔	❌	✅