mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-14 07:05:01 +08:00
[doc] fix "Other AI accelerators" getting started page (#19457)
Signed-off-by: David Xia <david@davidxia.com>
This commit is contained in:
parent
497a91e9f7
commit
89b0f84e17
@ -19,7 +19,8 @@ to set up the execution environment. To achieve the best performance,
|
||||
please follow the methods outlined in the
|
||||
[Optimizing Training Platform Guide](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Optimization_in_Training_Platform.html).
|
||||
|
||||
## Configure a new environment
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [start:configure-a-new-environment]
|
||||
|
||||
### Environment verification
|
||||
|
||||
@ -56,7 +57,7 @@ docker run \
|
||||
vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
|
||||
```
|
||||
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [end:configure-a-new-environment]
|
||||
# --8<-- [start:set-up-using-python]
|
||||
|
||||
# --8<-- [end:set-up-using-python]
|
||||
@ -183,7 +184,6 @@ Currently in vLLM for HPU we support four execution modes, depending on selected
|
||||
| 0 | 0 | torch.compile |
|
||||
| 0 | 1 | PyTorch eager mode |
|
||||
| 1 | 0 | HPU Graphs |
|
||||
<figcaption>vLLM execution modes</figcaption>
|
||||
|
||||
!!! warning
|
||||
In 1.18.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode.
|
||||
|
||||
@ -17,7 +17,8 @@
|
||||
- Accelerator: NeuronCore-v2 (in trn1/inf2 chips) or NeuronCore-v3 (in trn2 chips)
|
||||
- AWS Neuron SDK 2.23
|
||||
|
||||
## Configure a new environment
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [start:configure-a-new-environment]
|
||||
|
||||
### Launch a Trn1/Trn2/Inf2 instance and verify Neuron dependencies
|
||||
|
||||
@ -37,7 +38,7 @@ for alternative setup instructions including using Docker and manually installin
|
||||
NxD Inference is the default recommended backend to run inference on Neuron. If you are looking to use the legacy [transformers-neuronx](https://github.com/aws-neuron/transformers-neuronx)
|
||||
library, refer to [Transformers NeuronX Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/setup/index.html).
|
||||
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [end:configure-a-new-environment]
|
||||
# --8<-- [start:set-up-using-python]
|
||||
|
||||
# --8<-- [end:set-up-using-python]
|
||||
|
||||
@ -58,11 +58,13 @@ assigned to your Google Cloud project for your immediate exclusive use.
|
||||
### Provision Cloud TPUs with GKE
|
||||
|
||||
For more information about using TPUs with GKE, see:
|
||||
|
||||
- <https://cloud.google.com/kubernetes-engine/docs/how-to/tpus>
|
||||
- <https://cloud.google.com/kubernetes-engine/docs/concepts/tpus>
|
||||
- <https://cloud.google.com/kubernetes-engine/docs/concepts/plan-tpus>
|
||||
|
||||
## Configure a new environment
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [start:configure-a-new-environment]
|
||||
|
||||
### Provision a Cloud TPU with the queued resource API
|
||||
|
||||
@ -81,12 +83,12 @@ gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \
|
||||
| Parameter name | Description |
|
||||
|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| QUEUED_RESOURCE_ID | The user-assigned ID of the queued resource request. |
|
||||
| TPU_NAME | The user-assigned name of the TPU which is created when the queued |
|
||||
| TPU_NAME | The user-assigned name of the TPU which is created when the queued resource request is allocated. |
|
||||
| PROJECT_ID | Your Google Cloud project |
|
||||
| ZONE | The GCP zone where you want to create your Cloud TPU. The value you use |
|
||||
| ACCELERATOR_TYPE | The TPU version you want to use. Specify the TPU version, for example |
|
||||
| RUNTIME_VERSION | The TPU VM runtime version to use. For example, use `v2-alpha-tpuv6e` for a VM loaded with one or more v6e TPU(s). For more information see [TPU VM images](https://cloud.google.com/tpu/docs/runtimes). |
|
||||
<figcaption>Parameter descriptions</figcaption>
|
||||
| ZONE | The GCP zone where you want to create your Cloud TPU. The value you use depends on the version of TPUs you are using. For more information, see [TPU regions and zones] |
|
||||
| ACCELERATOR_TYPE | The TPU version you want to use. Specify the TPU version, for example `v5litepod-4` specifies a v5e TPU with 4 cores, `v6e-1` specifies a v6e TPU with 1 core. For more information, see [TPU versions]. |
|
||||
| RUNTIME_VERSION | The TPU VM runtime version to use. For example, use `v2-alpha-tpuv6e` for a VM loaded with one or more v6e TPU(s). For more information see [TPU VM images]. |
|
||||
| SERVICE_ACCOUNT | The email address for your service account. You can find it in the IAM Cloud Console under *Service Accounts*. For example: `tpu-service-account@<your_project_ID>.iam.gserviceaccount.com` |
|
||||
|
||||
Connect to your TPU using SSH:
|
||||
|
||||
@ -94,7 +96,11 @@ Connect to your TPU using SSH:
|
||||
gcloud compute tpus tpu-vm ssh TPU_NAME --zone ZONE
|
||||
```
|
||||
|
||||
# --8<-- [end:requirements]
|
||||
[TPU versions]: https://cloud.google.com/tpu/docs/runtimes
|
||||
[TPU VM images]: https://cloud.google.com/tpu/docs/runtimes
|
||||
[TPU regions and zones]: https://cloud.google.com/tpu/docs/regions-zones
|
||||
|
||||
# --8<-- [end:configure-a-new-environment]
|
||||
# --8<-- [start:set-up-using-python]
|
||||
|
||||
# --8<-- [end:set-up-using-python]
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user