mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 06:15:01 +08:00
14 KiB
14 KiB
| title |
|---|
| Compatibility Matrix |
The tables below show mutually exclusive features and the support on some hardware.
The symbols used have the following meanings:
- ✅ = Full compatibility
- 🟠 = Partial compatibility
- ❌ = No compatibility
!!! note Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/hardware combination.
Feature x Feature
| Feature | [CP][chunked-prefill] | [APC][automatic-prefix-caching] | [LoRA][lora-adapter] | prmpt adptr | [SD][spec-decode] | CUDA graph | pooling | enc-dec | logP | prmpt logP | async output | multi-step | mm | best-of | beam-search |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [CP][chunked-prefill] | ✅ | ||||||||||||||
| [APC][automatic-prefix-caching] | ✅ | ✅ | |||||||||||||
| [LoRA][lora-adapter] | ✅ | ✅ | ✅ | ||||||||||||
| prmpt adptr | ✅ | ✅ | ✅ | ✅ | |||||||||||
| [SD][spec-decode] | ✅ | ✅ | ❌ | ✅ | ✅ | ||||||||||
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||
| pooling | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ||||||||
| enc-dec | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | |||||||
| logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ||||||
| prmpt logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | |||||
| async output | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ||||
| multi-step | ❌ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | |||
| mm | ✅ | 🟠 | 🟠 | ❔ | ❔ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ||
| best-of | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | ❌ | ✅ | ✅ | |
| beam-search | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | ❌ | ❔ | ✅ | ✅ |
Feature x Hardware
| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD |
|---|---|---|---|---|---|---|---|
| [CP][chunked-prefill] | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [APC][automatic-prefix-caching] | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [LoRA][lora-adapter] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| prmpt adptr | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
| [SD][spec-decode] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
| pooling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ |
| enc-dec | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| mm | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| prmpt logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| async output | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| multi-step | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
| best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
!!! note Please refer to [Feature support through NxD Inference backend][feature-support-through-nxd-inference-backend] for features supported on AWS Neuron hardware