diff --git a/docs/.nav.yml b/docs/.nav.yml index e679807f7534..06bfcc3f1eff 100644 --- a/docs/.nav.yml +++ b/docs/.nav.yml @@ -39,6 +39,7 @@ nav: - models/generative_models.md - models/pooling_models.md - models/extensions + - Hardware Supported Models: models/hardware_supported_models - Features: - features/compatibility_matrix.md - features/* diff --git a/docs/features/compatibility_matrix.md b/docs/features/compatibility_matrix.md index 5d448eb5c03d..4f475ee4db83 100644 --- a/docs/features/compatibility_matrix.md +++ b/docs/features/compatibility_matrix.md @@ -59,23 +59,23 @@ th:not(:first-child) { ## Feature x Hardware -| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD | -|-----------------------------------------------------------|--------------------|----------|----------|-------|----------|--------------------|-------| -| [CP][chunked-prefill] | [❌](gh-issue:2729) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| [APC][automatic-prefix-caching] | [❌](gh-issue:3687) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| [LoRA][lora-adapter] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| prmpt adptr | ✅ | ✅ | ✅ | ✅ | ✅ | [❌](gh-issue:8475) | ✅ | -| [SD][spec-decode] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | -| pooling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | -| enc-dec | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | -| mm | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| prmpt logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| async output | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | -| multi-step | ✅ | ✅ | ✅ | ✅ | ✅ | [❌](gh-issue:8477) | ✅ | -| best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD | TPU | +|-----------------------------------------------------------|---------------------|-----------|-----------|--------|------------|--------------------|--------|-----| +| [CP][chunked-prefill] | [❌](gh-issue:2729) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| [APC][automatic-prefix-caching] | [❌](gh-issue:3687) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| [LoRA][lora-adapter] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| prmpt adptr | ✅ | ✅ | ✅ | ✅ | ✅ | [❌](gh-issue:8475) | ✅ | ❌ | +| [SD][spec-decode] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | +| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | +| pooling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❌ | +| enc-dec | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | +| mm | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | +| logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | +| prmpt logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | +| async output | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | +| multi-step | ✅ | ✅ | ✅ | ✅ | ✅ | [❌](gh-issue:8477) | ✅ | ❌ | +| best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | +| beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | !!! note Please refer to [Feature support through NxD Inference backend][feature-support-through-nxd-inference-backend] for features supported on AWS Neuron hardware diff --git a/docs/models/hardware_supported_models/tpu.md b/docs/models/hardware_supported_models/tpu.md new file mode 100644 index 000000000000..dca5e20cb343 --- /dev/null +++ b/docs/models/hardware_supported_models/tpu.md @@ -0,0 +1,36 @@ +--- +title: TPU +--- +[](){ #tpu-supported-models } + +# TPU Supported Models +## Text-only Language Models + +| Model | Architecture | Supported | +|-----------------------------------------------------|--------------------------------|-----------| +| mistralai/Mixtral-8x7B-Instruct-v0.1 | MixtralForCausalLM | 🟨 | +| mistralai/Mistral-Small-24B-Instruct-2501 | MistralForCausalLM | ✅ | +| mistralai/Codestral-22B-v0.1 | MistralForCausalLM | ✅ | +| mistralai/Mixtral-8x22B-Instruct-v0.1 | MixtralForCausalLM | ❌ | +| meta-llama/Llama-3.3-70B-Instruct | LlamaForCausalLM | ✅ | +| meta-llama/Llama-3.1-8B-Instruct | LlamaForCausalLM | ✅ | +| meta-llama/Llama-3.1-70B-Instruct | LlamaForCausalLM | ✅ | +| meta-llama/Llama-4-* | Llama4ForConditionalGeneration | ❌ | +| microsoft/Phi-3-mini-128k-instruct | Phi3ForCausalLM | 🟨 | +| microsoft/phi-4 | Phi3ForCausalLM | ❌ | +| google/gemma-3-27b-it | Gemma3ForConditionalGeneration | 🟨 | +| google/gemma-3-4b-it | Gemma3ForConditionalGeneration | ❌ | +| deepseek-ai/DeepSeek-R1 | DeepseekV3ForCausalLM | ❌ | +| deepseek-ai/DeepSeek-V3 | DeepseekV3ForCausalLM | ❌ | +| RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 | LlamaForCausalLM | ✅ | +| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 | LlamaForCausalLM | ✅ | +| Qwen/Qwen3-8B | Qwen3ForCausalLM | ✅ | +| Qwen/Qwen3-32B | Qwen3ForCausalLM | ✅ | +| Qwen/Qwen2.5-7B-Instruct | Qwen2ForCausalLM | ✅ | +| Qwen/Qwen2.5-32B | Qwen2ForCausalLM | ✅ | +| Qwen/Qwen2.5-14B-Instruct | Qwen2ForCausalLM | ✅ | +| Qwen/Qwen2.5-1.5B-Instruct | Qwen2ForCausalLM | 🟨 | + +✅ Runs and optimized. +🟨 Runs and correct but not optimized to green yet. +❌ Does not pass accuracy test or does not run.