diff --git a/docs/cli/README.md b/docs/cli/README.md
index b2587a5e7cd2b..3541437659cac 100644
--- a/docs/cli/README.md
+++ b/docs/cli/README.md
@@ -16,7 +16,7 @@ vllm {chat,complete,serve,bench,collect-env,run-batch}
 
 Start the vLLM OpenAI Compatible API server.
 
-??? Examples
+??? console "Examples"
 
     ```bash
     # Start with a model
diff --git a/docs/configuration/conserving_memory.md b/docs/configuration/conserving_memory.md
index e2303067e3ee8..2b09498f79007 100644
--- a/docs/configuration/conserving_memory.md
+++ b/docs/configuration/conserving_memory.md
@@ -57,7 +57,7 @@ By default, we optimize model inference using CUDA graphs which take up extra me
 
 You can adjust `compilation_config` to achieve a better balance between inference speed and memory usage:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM
@@ -129,7 +129,7 @@ reduce the size of the processed multi-modal inputs, which in turn saves memory.
 
 Here are some examples:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM
diff --git a/docs/configuration/env_vars.md b/docs/configuration/env_vars.md
index c875931c305b6..2c0a898754fa0 100644
--- a/docs/configuration/env_vars.md
+++ b/docs/configuration/env_vars.md
@@ -7,7 +7,7 @@ vLLM uses the following environment variables to configure the system:
 
     All environment variables used by vLLM are prefixed with `VLLM_`. **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/envs.py:env-vars-definition"
diff --git a/docs/contributing/README.md b/docs/contributing/README.md
index 83525436be139..f2d439e37ccc6 100644
--- a/docs/contributing/README.md
+++ b/docs/contributing/README.md
@@ -95,7 +95,7 @@ For additional features and advanced configurations, refer to the official [MkDo
 
 ## Testing
 
-??? note "Commands"
+??? console "Commands"
 
     ```bash
     pip install -r requirements/dev.txt
diff --git a/docs/contributing/model/basic.md b/docs/contributing/model/basic.md
index d552cd06be204..78289bf381d77 100644
--- a/docs/contributing/model/basic.md
+++ b/docs/contributing/model/basic.md
@@ -27,7 +27,7 @@ All vLLM modules within the model must include a `prefix` argument in their cons
 
 The initialization code should look like this:
 
-??? Code
+??? code
 
     ```python
     from torch import nn
diff --git a/docs/contributing/model/multimodal.md b/docs/contributing/model/multimodal.md
index 64daa9c2d4cdd..201ace0ab0802 100644
--- a/docs/contributing/model/multimodal.md
+++ b/docs/contributing/model/multimodal.md
@@ -12,7 +12,7 @@ Further update the model as follows:
 
 - Implement [get_placeholder_str][vllm.model_executor.models.interfaces.SupportsMultiModal.get_placeholder_str] to define the placeholder string which is used to represent the multi-modal item in the text prompt. This should be consistent with the chat template of the model.
 
-    ??? Code
+    ??? code
 
         ```python
         class YourModelForImage2Seq(nn.Module):
@@ -41,7 +41,7 @@ Further update the model as follows:
 
 - Implement [get_multimodal_embeddings][vllm.model_executor.models.interfaces.SupportsMultiModal.get_multimodal_embeddings] that returns the embeddings from running the multimodal inputs through the multimodal tokenizer of the model. Below we provide a boilerplate of a typical implementation pattern, but feel free to adjust it to your own needs.
 
-    ??? Code
+    ??? code
 
         ```python
         class YourModelForImage2Seq(nn.Module):
@@ -71,7 +71,7 @@ Further update the model as follows:
 
 - Implement [get_input_embeddings][vllm.model_executor.models.interfaces.SupportsMultiModal.get_input_embeddings] to merge `multimodal_embeddings` with text embeddings from the `input_ids`. If input processing for the model is implemented correctly (see sections below), then you can leverage the utility function we provide to easily merge the embeddings.
 
-    ??? Code
+    ??? code
 
         ```python
         from .utils import merge_multimodal_embeddings
@@ -155,7 +155,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
 
     Looking at the code of HF's `LlavaForConditionalGeneration`:
 
-    ??? Code
+    ??? code
 
         ```python
         # https://github.com/huggingface/transformers/blob/v4.47.1/src/transformers/models/llava/modeling_llava.py#L530-L544
@@ -179,7 +179,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
     The number of placeholder feature tokens per image is `image_features.shape[1]`.
     `image_features` is calculated inside the `get_image_features` method:
 
-    ??? Code
+    ??? code
 
         ```python
         # https://github.com/huggingface/transformers/blob/v4.47.1/src/transformers/models/llava/modeling_llava.py#L290-L300
@@ -217,7 +217,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
 
     To find the sequence length, we turn to the code of `CLIPVisionEmbeddings`:
 
-    ??? Code
+    ??? code
 
         ```python
         # https://github.com/huggingface/transformers/blob/v4.47.1/src/transformers/models/clip/modeling_clip.py#L247-L257
@@ -244,7 +244,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
 
     Overall, the number of placeholder feature tokens for an image can be calculated as:
 
-    ??? Code
+    ??? code
 
         ```python
         def get_num_image_tokens(
@@ -269,7 +269,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
     Notice that the number of image tokens doesn't depend on the image width and height.
     We can simply use a dummy `image_size` to calculate the multimodal profiling data:
 
-    ??? Code
+    ??? code
 
         ```python
         # NOTE: In actuality, this is usually implemented as part of the
@@ -314,7 +314,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
 
     Looking at the code of HF's `FuyuForCausalLM`:
 
-    ??? Code
+    ??? code
 
         ```python
         # https://github.com/huggingface/transformers/blob/v4.48.3/src/transformers/models/fuyu/modeling_fuyu.py#L311-L322
@@ -344,7 +344,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
     In `FuyuImageProcessor.preprocess`, the images are resized and padded to the target `FuyuImageProcessor.size`,
     returning the dimensions after resizing (but before padding) as metadata.
 
-    ??? Code
+    ??? code
 
         ```python
         # https://github.com/huggingface/transformers/blob/v4.48.3/src/transformers/models/fuyu/processing_fuyu.py#L541-L544
@@ -382,7 +382,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
 
     In `FuyuImageProcessor.preprocess_with_tokenizer_info`, the images are split into patches based on this metadata:
 
-    ??? Code
+    ??? code
 
         ```python
         # https://github.com/huggingface/transformers/blob/v4.48.3/src/transformers/models/fuyu/processing_fuyu.py#L417-L425
@@ -420,7 +420,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
 
     The number of patches is in turn defined by `FuyuImageProcessor.get_num_patches`:
 
-    ??? Code
+    ??? code
 
         ```python
         # https://github.com/huggingface/transformers/blob/v4.48.3/src/transformers/models/fuyu/image_processing_fuyu.py#L552-L562
@@ -457,7 +457,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
 
     For the multimodal image profiling data, the logic is very similar to LLaVA:
 
-    ??? Code
+    ??? code
 
         ```python
         def get_dummy_mm_data(
@@ -546,7 +546,7 @@ return a schema of the tensors outputted by the HF processor that are related to
     In order to support the use of [MultiModalFieldConfig.batched][] like in LLaVA,
     we remove the extra batch dimension by overriding [BaseMultiModalProcessor._call_hf_processor][]:
 
-    ??? Code
+    ??? code
 
         ```python
         def _call_hf_processor(
@@ -623,7 +623,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies
     It simply repeats each input `image_token` a number of times equal to the number of placeholder feature tokens (`num_image_tokens`).
     Based on this, we override [_get_prompt_updates][vllm.multimodal.processing.BaseMultiModalProcessor._get_prompt_updates] as follows:
 
-    ??? Code
+    ??? code
 
         ```python
         def _get_prompt_updates(
@@ -668,7 +668,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies
 
     We define a helper function to return `ncols` and `nrows` directly:
 
-    ??? Code
+    ??? code
 
         ```python
         def get_image_feature_grid_size(
@@ -698,7 +698,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies
 
     Based on this, we can initially define our replacement tokens as:
 
-    ??? Code
+    ??? code
 
         ```python
         def get_replacement(item_idx: int):
@@ -718,7 +718,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies
     However, this is not entirely correct. After `FuyuImageProcessor.preprocess_with_tokenizer_info` is called,
     a BOS token (`<s>`) is also added to the promopt:
 
-    ??? Code
+    ??? code
 
         ```python
         # https://github.com/huggingface/transformers/blob/v4.48.3/src/transformers/models/fuyu/processing_fuyu.py#L417-L435
@@ -745,7 +745,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies
     To assign the vision embeddings to only the image tokens, instead of a string
     you can return an instance of [PromptUpdateDetails][vllm.multimodal.processing.PromptUpdateDetails]:
 
-    ??? Code
+    ??? code
 
         ```python
         hf_config = self.info.get_hf_config()
@@ -772,7 +772,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies
     Finally, noticing that the HF processor removes the `|ENDOFTEXT|` token from the tokenized prompt,
     we can search for it to conduct the replacement at the start of the string:
 
-    ??? Code
+    ??? code
 
         ```python
         def _get_prompt_updates(
diff --git a/docs/contributing/profiling.md b/docs/contributing/profiling.md
index 20f4867057d3e..a5851cfe963d2 100644
--- a/docs/contributing/profiling.md
+++ b/docs/contributing/profiling.md
@@ -125,7 +125,7 @@ to manually kill the profiler and generate your `nsys-rep` report.
 
 You can view these profiles either as summaries in the CLI, using `nsys stats [profile-file]`, or in the GUI by installing Nsight [locally following the directions here](https://developer.nvidia.com/nsight-systems/get-started).
 
-??? CLI example
+??? console "CLI example"
 
     ```bash
     nsys stats report1.nsys-rep
diff --git a/docs/deployment/docker.md b/docs/deployment/docker.md
index 5f6a22c28c28e..38633860b6179 100644
--- a/docs/deployment/docker.md
+++ b/docs/deployment/docker.md
@@ -97,7 +97,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
     flags to speed up build process. However, ensure your `max_jobs` is substantially larger than `nvcc_threads` to get the most benefits.
     Keep an eye on memory usage with parallel jobs as it can be substantial (see example below).
 
-??? Command
+??? console "Command"
 
     ```bash
     # Example of building on Nvidia GH200 server. (Memory usage: ~15GB, Build time: ~1475s / ~25 min, Image size: 6.93GB)
diff --git a/docs/deployment/frameworks/autogen.md b/docs/deployment/frameworks/autogen.md
index 13930e67ab2f5..91127bed2854e 100644
--- a/docs/deployment/frameworks/autogen.md
+++ b/docs/deployment/frameworks/autogen.md
@@ -30,7 +30,7 @@ python -m vllm.entrypoints.openai.api_server \
 
 - Call it with AutoGen:
 
-??? Code
+??? code
 
     ```python
     import asyncio
diff --git a/docs/deployment/frameworks/cerebrium.md b/docs/deployment/frameworks/cerebrium.md
index 5c5f2f48d50b7..d47773dd0c86e 100644
--- a/docs/deployment/frameworks/cerebrium.md
+++ b/docs/deployment/frameworks/cerebrium.md
@@ -34,7 +34,7 @@ vllm = "latest"
 
 Next, let us add our code to handle inference for the LLM of your choice (`mistralai/Mistral-7B-Instruct-v0.1` for this example), add the following code to your `main.py`:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
@@ -64,7 +64,7 @@ cerebrium deploy
 
 If successful, you should be returned a CURL command that you can call inference against. Just remember to end the url with the function name you are calling (in our case`/run`)
 
-??? Command
+??? console "Command"
 
     ```python
     curl -X POST https://api.cortex.cerebrium.ai/v4/p-xxxxxx/vllm/run \
@@ -82,7 +82,7 @@ If successful, you should be returned a CURL command that you can call inference
 
 You should get a response like:
 
-??? Response
+??? console "Response"
 
     ```python
     {
diff --git a/docs/deployment/frameworks/dstack.md b/docs/deployment/frameworks/dstack.md
index 8b4bc459683b0..8be655e23a2ea 100644
--- a/docs/deployment/frameworks/dstack.md
+++ b/docs/deployment/frameworks/dstack.md
@@ -26,7 +26,7 @@ dstack init
 
 Next, to provision a VM instance with LLM of your choice (`NousResearch/Llama-2-7b-chat-hf` for this example), create the following `serve.dstack.yml` file for the dstack `Service`:
 
-??? Config
+??? code "Config"
 
     ```yaml
     type: service
@@ -48,7 +48,7 @@ Next, to provision a VM instance with LLM of your choice (`NousResearch/Llama-2-
 
 Then, run the following CLI for provisioning:
 
-??? Command
+??? console "Command"
 
     ```console
     $ dstack run . -f serve.dstack.yml
@@ -79,7 +79,7 @@ Then, run the following CLI for provisioning:
 
 After the provisioning, you can interact with the model by using the OpenAI SDK:
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
diff --git a/docs/deployment/frameworks/haystack.md b/docs/deployment/frameworks/haystack.md
index 7a4cab4c2ee35..0a52d017c301d 100644
--- a/docs/deployment/frameworks/haystack.md
+++ b/docs/deployment/frameworks/haystack.md
@@ -27,7 +27,7 @@ vllm serve mistralai/Mistral-7B-Instruct-v0.1
 
 - Use the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack to query the vLLM server.
 
-??? Code
+??? code
 
     ```python
     from haystack.components.generators.chat import OpenAIChatGenerator
diff --git a/docs/deployment/frameworks/litellm.md b/docs/deployment/frameworks/litellm.md
index 8279613b1a273..c7cdd1020f2a9 100644
--- a/docs/deployment/frameworks/litellm.md
+++ b/docs/deployment/frameworks/litellm.md
@@ -34,7 +34,7 @@ vllm serve qwen/Qwen1.5-0.5B-Chat
 
 - Call it with litellm:
 
-??? Code
+??? code
 
     ```python
     import litellm 
diff --git a/docs/deployment/frameworks/lws.md b/docs/deployment/frameworks/lws.md
index 9df9528769064..d0ca6d6dd054d 100644
--- a/docs/deployment/frameworks/lws.md
+++ b/docs/deployment/frameworks/lws.md
@@ -17,7 +17,7 @@ vLLM can be deployed with [LWS](https://github.com/kubernetes-sigs/lws) on Kuber
 
 Deploy the following yaml file `lws.yaml`
 
-??? Yaml
+??? code "Yaml"
 
     ```yaml
     apiVersion: leaderworkerset.x-k8s.io/v1
@@ -177,7 +177,7 @@ curl http://localhost:8080/v1/completions \
 
 The output should be similar to the following
 
-??? Output
+??? console "Output"
 
     ```text
     {
diff --git a/docs/deployment/frameworks/skypilot.md b/docs/deployment/frameworks/skypilot.md
index ecf987539ced4..a0efc50416b40 100644
--- a/docs/deployment/frameworks/skypilot.md
+++ b/docs/deployment/frameworks/skypilot.md
@@ -24,7 +24,7 @@ sky check
 
 See the vLLM SkyPilot YAML for serving, [serving.yaml](https://github.com/skypilot-org/skypilot/blob/master/llm/vllm/serve.yaml).
 
-??? Yaml
+??? code "Yaml"
 
     ```yaml
     resources:
@@ -95,7 +95,7 @@ HF_TOKEN="your-huggingface-token" \
 
 SkyPilot can scale up the service to multiple service replicas with built-in autoscaling, load-balancing and fault-tolerance. You can do it by adding a services section to the YAML file.
 
-??? Yaml
+??? code "Yaml"
 
     ```yaml
     service:
@@ -111,7 +111,7 @@ SkyPilot can scale up the service to multiple service replicas with built-in aut
       max_completion_tokens: 1
     ```
 
-??? Yaml
+??? code "Yaml"
 
     ```yaml
     service:
@@ -186,7 +186,7 @@ vllm          2   1        xx.yy.zz.245  18 mins ago  1x GCP([Spot]{'L4': 1})  R
 
 After the service is READY, you can find a single endpoint for the service and access the service with the endpoint:
 
-??? Commands
+??? console "Commands"
 
     ```bash
     ENDPOINT=$(sky serve status --endpoint 8081 vllm)
@@ -220,7 +220,7 @@ service:
 
 This will scale the service up to when the QPS exceeds 2 for each replica.
 
-??? Yaml
+??? code "Yaml"
 
     ```yaml
     service:
@@ -285,7 +285,7 @@ sky serve down vllm
 
 It is also possible to access the Llama-3 service with a separate GUI frontend, so the user requests send to the GUI will be load-balanced across replicas.
 
-??? Yaml
+??? code "Yaml"
 
     ```yaml
     envs:
diff --git a/docs/deployment/integrations/production-stack.md b/docs/deployment/integrations/production-stack.md
index 2b1cc6f6fee18..d9e77dd343f5f 100644
--- a/docs/deployment/integrations/production-stack.md
+++ b/docs/deployment/integrations/production-stack.md
@@ -60,7 +60,7 @@ And then you can send out a query to the OpenAI-compatible API to check the avai
 curl -o- http://localhost:30080/models
 ```
 
-??? Output
+??? console "Output"
 
     ```json
     {
@@ -89,7 +89,7 @@ curl -X POST http://localhost:30080/completions \
   }'
 ```
 
-??? Output
+??? console "Output"
 
     ```json
     {
@@ -121,7 +121,7 @@ sudo helm uninstall vllm
 
 The core vLLM production stack configuration is managed with YAML. Here is the example configuration used in the installation above:
 
-??? Yaml
+??? code "Yaml"
 
     ```yaml
     servingEngineSpec:
diff --git a/docs/deployment/k8s.md b/docs/deployment/k8s.md
index f01e3d2fae0eb..84e65603d7b1a 100644
--- a/docs/deployment/k8s.md
+++ b/docs/deployment/k8s.md
@@ -29,7 +29,7 @@ Alternatively, you can deploy vLLM to Kubernetes using any of the following:
 
 First, create a Kubernetes PVC and Secret for downloading and storing Hugging Face model:
 
-??? Config
+??? console "Config"
 
     ```bash
     cat <<EOF |kubectl apply -f -
@@ -57,7 +57,7 @@ First, create a Kubernetes PVC and Secret for downloading and storing Hugging Fa
 
 Next, start the vLLM server as a Kubernetes Deployment and Service:
 
-??? Config
+??? console "Config"
 
     ```bash
     cat <<EOF |kubectl apply -f -
diff --git a/docs/deployment/nginx.md b/docs/deployment/nginx.md
index 7f09453be0c42..fc8ee3f5e35f5 100644
--- a/docs/deployment/nginx.md
+++ b/docs/deployment/nginx.md
@@ -36,7 +36,7 @@ docker build . -f Dockerfile.nginx --tag nginx-lb
 
 Create a file named `nginx_conf/nginx.conf`. Note that you can add as many servers as you'd like. In the below example we'll start with two. To add more, add another `server vllmN:8000 max_fails=3 fail_timeout=10000s;` entry to `upstream backend`.
 
-??? Config
+??? console "Config"
 
     ```console
     upstream backend {
@@ -95,7 +95,7 @@ Notes:
 - The below example assumes GPU backend used. If you are using CPU backend, remove `--gpus device=ID`, add `VLLM_CPU_KVCACHE_SPACE` and `VLLM_CPU_OMP_THREADS_BIND` environment variables to the docker run command.
 - Adjust the model name that you want to use in your vLLM servers if you don't want to use `Llama-2-7b-chat-hf`.
 
-??? Commands
+??? console "Commands"
 
     ```console
     mkdir -p ~/.cache/huggingface/hub/
diff --git a/docs/design/arch_overview.md b/docs/design/arch_overview.md
index b2ef76c0e7f86..36928369acdd5 100644
--- a/docs/design/arch_overview.md
+++ b/docs/design/arch_overview.md
@@ -22,7 +22,7 @@ server.
 
 Here is a sample of `LLM` class usage:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
@@ -180,7 +180,7 @@ vision-language model.
 
     To avoid accidentally passing incorrect arguments, the constructor is now keyword-only. This ensures that the constructor will raise an error if old configurations are passed. vLLM developers have already made this change for all models within vLLM. For out-of-tree registered models, developers need to update their models, for example by adding shim code to adapt the old constructor signature to the new one:
 
-    ??? Code
+    ??? code
 
         ```python
         class MyOldModel(nn.Module):
diff --git a/docs/design/kernel/paged_attention.md b/docs/design/kernel/paged_attention.md
index ff135a7319603..8c0eb05018e76 100644
--- a/docs/design/kernel/paged_attention.md
+++ b/docs/design/kernel/paged_attention.md
@@ -448,7 +448,7 @@ elements of the entire head for all context tokens. However, overall,
 all results for output have been calculated but are just stored in
 different thread register memory.
 
-??? Code
+??? code
 
     ```cpp
     float* out_smem = reinterpret_cast<float*>(shared_mem);
diff --git a/docs/design/plugin_system.md b/docs/design/plugin_system.md
index 944f0e680de4d..959c9cefc1c54 100644
--- a/docs/design/plugin_system.md
+++ b/docs/design/plugin_system.md
@@ -13,7 +13,7 @@ Plugins are user-registered code that vLLM executes. Given vLLM's architecture (
 
 vLLM's plugin system uses the standard Python `entry_points` mechanism. This mechanism allows developers to register functions in their Python packages for use by other packages. An example of a plugin:
 
-??? Code
+??? code
 
     ```python
     # inside `setup.py` file
diff --git a/docs/design/v1/p2p_nccl_connector.md b/docs/design/v1/p2p_nccl_connector.md
index 32cdaacf058ae..b1df93cfc85d3 100644
--- a/docs/design/v1/p2p_nccl_connector.md
+++ b/docs/design/v1/p2p_nccl_connector.md
@@ -61,7 +61,7 @@ To address the above issues, I have designed and developed a local Tensor memory
 
 # Install vLLM
 
-??? Commands
+??? console "Commands"
 
     ```shell
     # Enter the home directory or your working directory.
@@ -106,7 +106,7 @@ python3 disagg_prefill_proxy_xpyd.py &
 
 ### Prefill1 (e.g. 10.0.1.2 or 10.0.1.1)
 
-??? Command
+??? console "Command"
 
     ```shell
     VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
@@ -128,7 +128,7 @@ python3 disagg_prefill_proxy_xpyd.py &
 
 ### Decode1 (e.g. 10.0.1.3 or 10.0.1.1)
 
-??? Command
+??? console "Command"
 
     ```shell
     VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
@@ -150,7 +150,7 @@ python3 disagg_prefill_proxy_xpyd.py &
 
 ### Decode2 (e.g. 10.0.1.4 or 10.0.1.1)
 
-??? Command
+??? console "Command"
 
     ```shell
     VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
@@ -172,7 +172,7 @@ python3 disagg_prefill_proxy_xpyd.py &
 
 ### Decode3 (e.g. 10.0.1.5 or 10.0.1.1)
 
-??? Command
+??? console "Command"
 
     ```shell
     VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
@@ -203,7 +203,7 @@ python3 disagg_prefill_proxy_xpyd.py &
 
 ### Prefill1 (e.g. 10.0.1.2 or 10.0.1.1)
 
-??? Command
+??? console "Command"
 
     ```shell
     VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
@@ -225,7 +225,7 @@ python3 disagg_prefill_proxy_xpyd.py &
 
 ### Prefill2 (e.g. 10.0.1.3 or 10.0.1.1)
 
-??? Command
+??? console "Command"
 
     ```shell
     VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
@@ -247,7 +247,7 @@ python3 disagg_prefill_proxy_xpyd.py &
 
 ### Prefill3 (e.g. 10.0.1.4 or 10.0.1.1)
 
-??? Command
+??? console "Command"
 
     ```shell
     VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
@@ -269,7 +269,7 @@ python3 disagg_prefill_proxy_xpyd.py &
 
 ### Decode1 (e.g. 10.0.1.5 or 10.0.1.1)
 
-??? Command
+??? console "Command"
 
     ```shell
     VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
@@ -304,7 +304,7 @@ curl -X POST -s http://10.0.1.1:10001/v1/completions \
 
 # Benchmark
 
-??? Command
+??? console "Command"
 
     ```shell
     python3 benchmark_serving.py \
diff --git a/docs/design/v1/torch_compile.md b/docs/design/v1/torch_compile.md
index b65099bd62a25..ea5d8ac212f7a 100644
--- a/docs/design/v1/torch_compile.md
+++ b/docs/design/v1/torch_compile.md
@@ -28,7 +28,7 @@ A unique aspect of vLLM's `torch.compile` integration, is that we guarantee all
 
 In the very verbose logs, we can see:
 
-??? Logs
+??? console "Logs"
 
       ```text
       DEBUG 03-07 03:06:52 [decorators.py:203] Start compiling function <code object forward at 0x7f08acf40c90, file "xxx/vllm/model_executor/models/llama.py", line 339>
@@ -110,7 +110,7 @@ Then it will also compile a specific kernel just for batch size `1, 2, 4, 8`. At
 
 When all the shapes are known, `torch.compile` can compare different configs, and often find some better configs to run the kernel. For example, we can see the following log:
 
-??? Logs
+??? console "Logs"
 
     ```
     AUTOTUNE mm(8x2048, 2048x3072)
diff --git a/docs/features/lora.md b/docs/features/lora.md
index 4ccc3290e56a2..64d40a72994db 100644
--- a/docs/features/lora.md
+++ b/docs/features/lora.md
@@ -29,7 +29,7 @@ We can now submit the prompts and call `llm.generate` with the `lora_request` pa
 of `LoRARequest` is a human identifiable name, the second parameter is a globally unique ID for the adapter and
 the third parameter is the path to the LoRA adapter.
 
-??? Code
+??? code
 
     ```python
     sampling_params = SamplingParams(
@@ -70,7 +70,7 @@ The server entrypoint accepts all other LoRA configuration parameters (`max_lora
 etc.), which will apply to all forthcoming requests. Upon querying the `/models` endpoint, we should see our LoRA along
 with its base model (if `jq` is not installed, you can follow [this guide](https://jqlang.org/download/) to install it.):
 
-??? Command
+??? console "Command"
 
     ```bash
     curl localhost:8000/v1/models | jq .
@@ -172,7 +172,7 @@ Alternatively, follow these example steps to implement your own plugin:
 
 1. Implement the LoRAResolver interface.
 
-    ??? Example of a simple S3 LoRAResolver implementation
+    ??? code "Example of a simple S3 LoRAResolver implementation"
 
         ```python
         import os
@@ -238,7 +238,7 @@ The new format of `--lora-modules` is mainly to support the display of parent mo
 - The `parent` field of LoRA model `sql-lora` now links to its base model `meta-llama/Llama-2-7b-hf`. This correctly reflects the hierarchical relationship between the base model and the LoRA adapter.
 - The `root` field points to the artifact location of the lora adapter.
 
-??? Command output
+??? console "Command output"
 
     ```bash
     $ curl http://localhost:8000/v1/models
diff --git a/docs/features/multimodal_inputs.md b/docs/features/multimodal_inputs.md
index ed11d28360378..7c25f6f406a3f 100644
--- a/docs/features/multimodal_inputs.md
+++ b/docs/features/multimodal_inputs.md
@@ -20,7 +20,7 @@ To input multi-modal data, follow this schema in [vllm.inputs.PromptType][]:
 
 You can pass a single image to the `'image'` field of the multi-modal dictionary, as shown in the following examples:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM
@@ -68,7 +68,7 @@ Full example: <gh-file:examples/offline_inference/vision_language.py>
 
 To substitute multiple images inside the same text prompt, you can pass in a list of images instead:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM
@@ -146,7 +146,7 @@ for o in outputs:
 
 Multi-image input can be extended to perform video captioning. We show this with [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) as it supports videos:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM
@@ -193,7 +193,7 @@ Full example: <gh-file:examples/offline_inference/audio_language.py>
 To input pre-computed embeddings belonging to a data type (i.e. image, video, or audio) directly to the language model,
 pass a tensor of shape `(num_items, feature_size, hidden_size of LM)` to the corresponding field of the multi-modal dictionary.
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM
@@ -220,7 +220,7 @@ pass a tensor of shape `(num_items, feature_size, hidden_size of LM)` to the cor
 
 For Qwen2-VL and MiniCPM-V, we accept additional parameters alongside the embeddings:
 
-??? Code
+??? code
 
     ```python
     # Construct the prompt based on your model
@@ -288,7 +288,7 @@ vllm serve microsoft/Phi-3.5-vision-instruct --task generate \
 
 Then, you can use the OpenAI client as follows:
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -366,7 +366,7 @@ vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf --task generate --max-model
 
 Then, you can use the OpenAI client as follows:
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -430,7 +430,7 @@ vllm serve fixie-ai/ultravox-v0_5-llama-3_2-1b
 
 Then, you can use the OpenAI client as follows:
 
-??? Code
+??? code
 
     ```python
     import base64
@@ -486,7 +486,7 @@ Then, you can use the OpenAI client as follows:
 
 Alternatively, you can pass `audio_url`, which is the audio counterpart of `image_url` for image input:
 
-??? Code
+??? code
 
     ```python
     chat_completion_from_url = client.chat.completions.create(
@@ -531,7 +531,7 @@ pass a tensor of shape to the corresponding field of the multi-modal dictionary.
 For image embeddings, you can pass the base64-encoded tensor to the `image_embeds` field.
 The following example demonstrates how to pass image embeddings to the OpenAI server:
 
-??? Code
+??? code
 
     ```python
     image_embedding = torch.load(...)
diff --git a/docs/features/quantization/auto_awq.md b/docs/features/quantization/auto_awq.md
index 9f97ea406e25f..2361a27a499dd 100644
--- a/docs/features/quantization/auto_awq.md
+++ b/docs/features/quantization/auto_awq.md
@@ -15,7 +15,7 @@ pip install autoawq
 
 After installing AutoAWQ, you are ready to quantize a model. Please refer to the [AutoAWQ documentation](https://casper-hansen.github.io/AutoAWQ/examples/#basic-quantization) for further details. Here is an example of how to quantize `mistralai/Mistral-7B-Instruct-v0.2`:
 
-??? Code
+??? code
 
     ```python
     from awq import AutoAWQForCausalLM
@@ -51,7 +51,7 @@ python examples/offline_inference/llm_engine_example.py \
 
 AWQ models are also supported directly through the LLM entrypoint:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
diff --git a/docs/features/quantization/bitblas.md b/docs/features/quantization/bitblas.md
index c8f874ff84147..d1a431ddc9319 100644
--- a/docs/features/quantization/bitblas.md
+++ b/docs/features/quantization/bitblas.md
@@ -43,7 +43,7 @@ llm = LLM(
 
 ## Read gptq format checkpoint
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM
diff --git a/docs/features/quantization/fp8.md b/docs/features/quantization/fp8.md
index b9ed668b2ef31..65b4285a5418b 100644
--- a/docs/features/quantization/fp8.md
+++ b/docs/features/quantization/fp8.md
@@ -58,7 +58,7 @@ For FP8 quantization, we can recover accuracy with simple RTN quantization. We r
 
 Since simple RTN does not require data for weight quantization and the activations are quantized dynamically, we do not need any calibration data for this quantization flow.
 
-??? Code
+??? code
 
     ```python
     from llmcompressor.transformers import oneshot
diff --git a/docs/features/quantization/gguf.md b/docs/features/quantization/gguf.md
index 102a3ee1ccccb..60b3bcd2a5aae 100644
--- a/docs/features/quantization/gguf.md
+++ b/docs/features/quantization/gguf.md
@@ -41,7 +41,7 @@ vllm serve ./tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
 
 You can also use the GGUF model directly through the LLM entrypoint:
 
-??? Code
+??? code
 
       ```python
       from vllm import LLM, SamplingParams
diff --git a/docs/features/quantization/gptqmodel.md b/docs/features/quantization/gptqmodel.md
index 37bb02d4fb5bf..500803c208a40 100644
--- a/docs/features/quantization/gptqmodel.md
+++ b/docs/features/quantization/gptqmodel.md
@@ -31,7 +31,7 @@ After installing GPTQModel, you are ready to quantize a model. Please refer to t
 
 Here is an example of how to quantize `meta-llama/Llama-3.2-1B-Instruct`:
 
-??? Code
+??? code
 
     ```python
     from datasets import load_dataset
@@ -69,7 +69,7 @@ python examples/offline_inference/llm_engine_example.py \
 
 GPTQModel quantized models are also supported directly through the LLM entrypoint:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
diff --git a/docs/features/quantization/int4.md b/docs/features/quantization/int4.md
index 2008bef5c8a25..8d9fe46818ebf 100644
--- a/docs/features/quantization/int4.md
+++ b/docs/features/quantization/int4.md
@@ -53,7 +53,7 @@ When quantizing weights to INT4, you need sample data to estimate the weight upd
 It's best to use calibration data that closely matches your deployment data.
 For a general-purpose instruction-tuned model, you can use a dataset like `ultrachat`:
 
-??? Code
+??? code
 
     ```python
     from datasets import load_dataset
@@ -78,7 +78,7 @@ For a general-purpose instruction-tuned model, you can use a dataset like `ultra
 
 Now, apply the quantization algorithms:
 
-??? Code
+??? code
 
     ```python
     from llmcompressor.transformers import oneshot
@@ -141,7 +141,7 @@ lm_eval --model vllm \
 
 The following is an example of an expanded quantization recipe you can tune to your own use case:
 
-??? Code
+??? code
 
     ```python
     from compressed_tensors.quantization import (
diff --git a/docs/features/quantization/int8.md b/docs/features/quantization/int8.md
index 3a8f855aa0577..3635e841b8148 100644
--- a/docs/features/quantization/int8.md
+++ b/docs/features/quantization/int8.md
@@ -54,7 +54,7 @@ When quantizing activations to INT8, you need sample data to estimate the activa
 It's best to use calibration data that closely matches your deployment data.
 For a general-purpose instruction-tuned model, you can use a dataset like `ultrachat`:
 
-??? Code
+??? code
 
     ```python
     from datasets import load_dataset
@@ -81,7 +81,7 @@ For a general-purpose instruction-tuned model, you can use a dataset like `ultra
 
 Now, apply the quantization algorithms:
 
-??? Code
+??? code
 
     ```python
     from llmcompressor.transformers import oneshot
diff --git a/docs/features/quantization/modelopt.md b/docs/features/quantization/modelopt.md
index 39f2a78e705fc..39ae03b1bdac0 100644
--- a/docs/features/quantization/modelopt.md
+++ b/docs/features/quantization/modelopt.md
@@ -14,7 +14,7 @@ You can quantize HuggingFace models using the example scripts provided in the Te
 
 Below is an example showing how to quantize a model using modelopt's PTQ API:
 
-??? Code
+??? code
 
     ```python
     import modelopt.torch.quantization as mtq
@@ -50,7 +50,7 @@ with torch.inference_mode():
 
 The quantized checkpoint can then be deployed with vLLM. As an example, the following code shows how to deploy `nvidia/Llama-3.1-8B-Instruct-FP8`, which is the FP8 quantized checkpoint derived from `meta-llama/Llama-3.1-8B-Instruct`, using vLLM:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
diff --git a/docs/features/quantization/quantized_kvcache.md b/docs/features/quantization/quantized_kvcache.md
index 323dcb7d052d0..e76547d0e9c68 100644
--- a/docs/features/quantization/quantized_kvcache.md
+++ b/docs/features/quantization/quantized_kvcache.md
@@ -35,7 +35,7 @@ Studies have shown that FP8 E4M3 quantization typically only minimally degrades
 
 Here is an example of how to enable FP8 quantization:
 
-??? Code
+??? code
 
     ```python
     # To calculate kv cache scales on the fly enable the calculate_kv_scales
@@ -73,7 +73,7 @@ pip install llmcompressor
 
 Here's a complete example using `meta-llama/Llama-3.1-8B-Instruct` (most models can use this same pattern):
 
-??? Code
+??? code
 
     ```python
     from datasets import load_dataset
diff --git a/docs/features/quantization/quark.md b/docs/features/quantization/quark.md
index 77e3834954062..13afbc1e058e2 100644
--- a/docs/features/quantization/quark.md
+++ b/docs/features/quantization/quark.md
@@ -42,7 +42,7 @@ The Quark quantization process can be listed for 5 steps as below:
 Quark uses [Transformers](https://huggingface.co/docs/transformers/en/index)
 to fetch model and tokenizer.
 
-??? Code
+??? code
 
     ```python
     from transformers import AutoTokenizer, AutoModelForCausalLM
@@ -65,7 +65,7 @@ Quark uses the [PyTorch Dataloader](https://pytorch.org/tutorials/beginner/basic
 to load calibration data. For more details about how to use calibration datasets efficiently, please refer
 to [Adding Calibration Datasets](https://quark.docs.amd.com/latest/pytorch/calibration_datasets.html).
 
-??? Code
+??? code
 
     ```python
     from datasets import load_dataset
@@ -98,7 +98,7 @@ kv-cache and the quantization algorithm is AutoSmoothQuant.
     AutoSmoothQuant config file for Llama is
     `examples/torch/language_modeling/llm_ptq/models/llama/autosmoothquant_config.json`.
 
-??? Code
+??? code
 
     ```python
     from quark.torch.quantization import (Config, QuantizationConfig,
@@ -145,7 +145,7 @@ HuggingFace `safetensors`, you can refer to
 [HuggingFace format exporting](https://quark.docs.amd.com/latest/pytorch/export/quark_export_hf.html)
 for more exporting format details.
 
-??? Code
+??? code
 
     ```python
     import torch
@@ -176,7 +176,7 @@ for more exporting format details.
 
 Now, you can load and run the Quark quantized model directly through the LLM entrypoint:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
diff --git a/docs/features/quantization/torchao.md b/docs/features/quantization/torchao.md
index f8df3c4b08096..ab6802177048b 100644
--- a/docs/features/quantization/torchao.md
+++ b/docs/features/quantization/torchao.md
@@ -15,7 +15,7 @@ pip install \
 ## Quantizing HuggingFace Models
 You can quantize your own huggingface model with torchao, e.g. [transformers](https://huggingface.co/docs/transformers/main/en/quantization/torchao) and [diffusers](https://huggingface.co/docs/diffusers/en/quantization/torchao), and save the checkpoint to huggingface hub like [this](https://huggingface.co/jerryzh168/llama3-8b-int8wo) with the following example code:
 
-??? Code
+??? code
 
     ```Python
     import torch
diff --git a/docs/features/reasoning_outputs.md b/docs/features/reasoning_outputs.md
index 2e6afe61663cb..90232a536cccc 100644
--- a/docs/features/reasoning_outputs.md
+++ b/docs/features/reasoning_outputs.md
@@ -33,7 +33,7 @@ vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
 
 Next, make a request to the model that should return the reasoning content in the response.
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -70,7 +70,7 @@ The `reasoning_content` field contains the reasoning steps that led to the final
 
 Streaming chat completions are also supported for reasoning models. The `reasoning_content` field is available in the `delta` field in [chat completion response chunks](https://platform.openai.com/docs/api-reference/chat/streaming).
 
-??? Json
+??? console "Json"
 
     ```json
     {
@@ -95,7 +95,7 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
 
 OpenAI Python client library does not officially support `reasoning_content` attribute for streaming output. But the client supports extra attributes in the response. You can use `hasattr` to check if the `reasoning_content` attribute is present in the response. For example:
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -152,7 +152,7 @@ Remember to check whether the `reasoning_content` exists in the response before
 
 The reasoning content is also available when both tool calling and the reasoning parser are enabled. Additionally, tool calling only parses functions from the `content` field, not from the `reasoning_content`.
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -200,7 +200,7 @@ For more examples, please refer to <gh-file:examples/online_serving/openai_chat_
 
 You can add a new `ReasoningParser` similar to <gh-file:vllm/reasoning/deepseek_r1_reasoning_parser.py>.
 
-??? Code
+??? code
 
     ```python
     # import the required packages
@@ -258,7 +258,7 @@ You can add a new `ReasoningParser` similar to <gh-file:vllm/reasoning/deepseek_
 
 Additionally, to enable structured output, you'll need to create a new `Reasoner` similar to the one in <gh-file:vllm/reasoning/deepseek_r1_reasoning_parser.py>.
 
-??? Code
+??? code
 
     ```python
     @dataclass
diff --git a/docs/features/spec_decode.md b/docs/features/spec_decode.md
index f28a74ce2262a..e22cc65cae99e 100644
--- a/docs/features/spec_decode.md
+++ b/docs/features/spec_decode.md
@@ -18,7 +18,7 @@ Speculative decoding is a technique which improves inter-token latency in memory
 
 The following code configures vLLM in an offline mode to use speculative decoding with a draft model, speculating 5 tokens at a time.
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
@@ -62,7 +62,7 @@ python -m vllm.entrypoints.openai.api_server \
 
 Then use a client:
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -103,7 +103,7 @@ Then use a client:
 The following code configures vLLM to use speculative decoding where proposals are generated by
 matching n-grams in the prompt. For more information read [this thread.](https://x.com/joao_gante/status/1747322413006643259)
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
@@ -137,7 +137,7 @@ draft models that conditioning draft predictions on both context vectors and sam
 For more information see [this blog](https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/) or
 [this technical report](https://arxiv.org/abs/2404.19124).
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
@@ -185,7 +185,7 @@ A variety of speculative models of this type are available on HF hub:
 The following code configures vLLM to use speculative decoding where proposals are generated by
 an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](gh-file:examples/offline_inference/eagle.py).
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
diff --git a/docs/features/structured_outputs.md b/docs/features/structured_outputs.md
index ea1d09644835f..c56ad400819bc 100644
--- a/docs/features/structured_outputs.md
+++ b/docs/features/structured_outputs.md
@@ -33,7 +33,7 @@ text.
 
 Now let´s see an example for each of the cases, starting with the `guided_choice`, as it´s the easiest one:
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -55,7 +55,7 @@ Now let´s see an example for each of the cases, starting with the `guided_choic
 
 The next example shows how to use the `guided_regex`. The idea is to generate an email address, given a simple regex template:
 
-??? Code
+??? code
 
     ```python
     completion = client.chat.completions.create(
@@ -79,7 +79,7 @@ For this we can use the `guided_json` parameter in two different ways:
 
 The next example shows how to use the `guided_json` parameter with a Pydantic model:
 
-??? Code
+??? code
 
     ```python
     from pydantic import BaseModel
@@ -127,7 +127,7 @@ difficult to use, but it´s really powerful. It allows us to define complete
 languages like SQL queries. It works by using a context free EBNF grammar.
 As an example, we can use to define a specific format of simplified SQL queries:
 
-??? Code
+??? code
 
     ```python
     simplified_sql_grammar = """
@@ -169,7 +169,7 @@ vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --reasoning-parser deepseek_r
 
 Note that you can use reasoning with any provided structured outputs feature. The following uses one with JSON schema:
 
-??? Code
+??? code
 
     ```python
     from pydantic import BaseModel
@@ -212,7 +212,7 @@ For the following examples, vLLM was setup using `vllm serve meta-llama/Llama-3.
 
 Here is a simple example demonstrating how to get structured output using Pydantic models:
 
-??? Code
+??? code
 
     ```python
     from pydantic import BaseModel
@@ -248,7 +248,7 @@ Age: 28
 
 Here is a more complex example using nested Pydantic models to handle a step-by-step math solution:
 
-??? Code
+??? code
 
     ```python
     from typing import List
@@ -308,7 +308,7 @@ These parameters can be used in the same way as the parameters from the Online
 Serving examples above. One example for the usage of the `choice` parameter is
 shown below:
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM, SamplingParams
diff --git a/docs/features/tool_calling.md b/docs/features/tool_calling.md
index 8858b9a4015a4..13a8386a29719 100644
--- a/docs/features/tool_calling.md
+++ b/docs/features/tool_calling.md
@@ -15,7 +15,7 @@ vllm serve meta-llama/Llama-3.1-8B-Instruct \
 
 Next, make a request to the model that should result in it using the available tools:
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -320,7 +320,7 @@ A tool parser plugin is a Python file containing one or more ToolParser implemen
 
 Here is a summary of a plugin file:
 
-??? Code
+??? code
 
     ```python
 
diff --git a/docs/getting_started/installation/cpu.md b/docs/getting_started/installation/cpu.md
index 5f2d0dbe27d34..15f183bccaa12 100644
--- a/docs/getting_started/installation/cpu.md
+++ b/docs/getting_started/installation/cpu.md
@@ -76,7 +76,7 @@ Currently, there are no pre-built CPU wheels.
 
 ### Build image from source
 
-??? Commands
+??? console "Commands"
 
     ```bash
     docker build -f docker/Dockerfile.cpu \
@@ -149,7 +149,7 @@ vllm serve facebook/opt-125m
 
 - If using vLLM CPU backend on a machine with hyper-threading, it is recommended to bind only one OpenMP thread on each physical CPU core using `VLLM_CPU_OMP_THREADS_BIND` or using auto thread binding feature by default. On a hyper-threading enabled platform with 16 logical CPU cores / 8 physical CPU cores:
 
-??? Commands
+??? console "Commands"
 
     ```console
     $ lscpu -e # check the mapping between logical CPU cores and physical CPU cores
diff --git a/docs/getting_started/installation/gpu/rocm.inc.md b/docs/getting_started/installation/gpu/rocm.inc.md
index 3765807ba21d5..560883d3caf9e 100644
--- a/docs/getting_started/installation/gpu/rocm.inc.md
+++ b/docs/getting_started/installation/gpu/rocm.inc.md
@@ -95,7 +95,7 @@ Currently, there are no pre-built ROCm wheels.
 
 4. Build vLLM. For example, vLLM on ROCM 6.3 can be built with the following steps:
 
-    ??? Commands
+    ??? console "Commands"
 
         ```bash
         pip install --upgrade pip
@@ -206,7 +206,7 @@ DOCKER_BUILDKIT=1 docker build \
 
 To run the above docker image `vllm-rocm`, use the below command:
 
-??? Command
+??? console "Command"
 
     ```bash
     docker run -it \
diff --git a/docs/getting_started/installation/intel_gaudi.md b/docs/getting_started/installation/intel_gaudi.md
index 7a7a5a51c24cb..e1bba1eaba4a3 100644
--- a/docs/getting_started/installation/intel_gaudi.md
+++ b/docs/getting_started/installation/intel_gaudi.md
@@ -237,7 +237,7 @@ As an example, if a request of 3 sequences, with max sequence length of 412 come
 
 Warmup is an optional, but highly recommended step occurring before vLLM server starts listening. It executes a forward pass for each bucket with dummy data. The goal is to pre-compile all graphs and not incur any graph compilation overheads within bucket boundaries during server runtime. Each warmup step is logged during vLLM startup:
 
-??? Logs
+??? console "Logs"
 
     ```text
     INFO 08-01 22:26:47 hpu_model_runner.py:1066] [Warmup][Prompt][1/24] batch_size:4 seq_len:1024 free_mem:79.16 GiB
@@ -286,7 +286,7 @@ When there's large amount of requests pending, vLLM scheduler will attempt to fi
 
 Each described step is logged by vLLM server, as follows (negative values correspond to memory being released):
 
-??? Logs
+??? console "Logs"
 
     ```text
     INFO 08-02 17:37:44 hpu_model_runner.py:493] Prompt bucket config (min, step, max_warmup) bs:[1, 32, 4], seq:[128, 128, 1024]
diff --git a/docs/getting_started/quickstart.md b/docs/getting_started/quickstart.md
index 39100e4ca5405..216e93ac05b89 100644
--- a/docs/getting_started/quickstart.md
+++ b/docs/getting_started/quickstart.md
@@ -147,7 +147,7 @@ curl http://localhost:8000/v1/completions \
 
 Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API. For example, another way to query the server is via the `openai` Python package:
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -186,7 +186,7 @@ curl http://localhost:8000/v1/chat/completions \
 
 Alternatively, you can use the `openai` Python package:
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
diff --git a/docs/mkdocs/stylesheets/extra.css b/docs/mkdocs/stylesheets/extra.css
index 5df9f1344012f..fb44d9cdcf3d3 100644
--- a/docs/mkdocs/stylesheets/extra.css
+++ b/docs/mkdocs/stylesheets/extra.css
@@ -39,6 +39,8 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
 :root {
   --md-admonition-icon--announcement: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M3.25 9a.75.75 0 0 1 .75.75c0 2.142.456 3.828.733 4.653a.122.122 0 0 0 .05.064.212.212 0 0 0 .117.033h1.31c.085 0 .18-.042.258-.152a.45.45 0 0 0 .075-.366A16.743 16.743 0 0 1 6 9.75a.75.75 0 0 1 1.5 0c0 1.588.25 2.926.494 3.85.293 1.113-.504 2.4-1.783 2.4H4.9c-.686 0-1.35-.41-1.589-1.12A16.4 16.4 0 0 1 2.5 9.75.75.75 0 0 1 3.25 9Z"></path><path d="M0 6a4 4 0 0 1 4-4h2.75a.75.75 0 0 1 .75.75v6.5a.75.75 0 0 1-.75.75H4a4 4 0 0 1-4-4Zm4-2.5a2.5 2.5 0 1 0 0 5h2v-5Z"></path><path d="M15.59.082A.75.75 0 0 1 16 .75v10.5a.75.75 0 0 1-1.189.608l-.002-.001h.001l-.014-.01a5.775 5.775 0 0 0-.422-.25 10.63 10.63 0 0 0-1.469-.64C11.576 10.484 9.536 10 6.75 10a.75.75 0 0 1 0-1.5c2.964 0 5.174.516 6.658 1.043.423.151.787.302 1.092.443V2.014c-.305.14-.669.292-1.092.443C11.924 2.984 9.713 3.5 6.75 3.5a.75.75 0 0 1 0-1.5c2.786 0 4.826-.484 6.155-.957.665-.236 1.154-.47 1.47-.64.144-.077.284-.161.421-.25l.014-.01a.75.75 0 0 1 .78-.061Z"></path></svg>');
   --md-admonition-icon--important: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M4.47.22A.749.749 0 0 1 5 0h6c.199 0 .389.079.53.22l4.25 4.25c.141.14.22.331.22.53v6a.749.749 0 0 1-.22.53l-4.25 4.25A.749.749 0 0 1 11 16H5a.749.749 0 0 1-.53-.22L.22 11.53A.749.749 0 0 1 0 11V5c0-.199.079-.389.22-.53Zm.84 1.28L1.5 5.31v5.38l3.81 3.81h5.38l3.81-3.81V5.31L10.69 1.5ZM8 4a.75.75 0 0 1 .75.75v3.5a.75.75 0 0 1-1.5 0v-3.5A.75.75 0 0 1 8 4Zm0 8a1 1 0 1 1 0-2 1 1 0 0 1 0 2Z"></path></svg>');
+  --md-admonition-icon--code: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="m11.28 3.22 4.25 4.25a.75.75 0 0 1 0 1.06l-4.25 4.25a.749.749 0 0 1-1.275-.326.75.75 0 0 1 .215-.734L13.94 8l-3.72-3.72a.749.749 0 0 1 .326-1.275.75.75 0 0 1 .734.215m-6.56 0a.75.75 0 0 1 1.042.018.75.75 0 0 1 .018 1.042L2.06 8l3.72 3.72a.749.749 0 0 1-.326 1.275.75.75 0 0 1-.734-.215L.47 8.53a.75.75 0 0 1 0-1.06Z"/></svg>');
+  --md-admonition-icon--console: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M0 2.75C0 1.784.784 1 1.75 1h12.5c.966 0 1.75.784 1.75 1.75v10.5A1.75 1.75 0 0 1 14.25 15H1.75A1.75 1.75 0 0 1 0 13.25Zm1.75-.25a.25.25 0 0 0-.25.25v10.5c0 .138.112.25.25.25h12.5a.25.25 0 0 0 .25-.25V2.75a.25.25 0 0 0-.25-.25ZM7.25 8a.75.75 0 0 1-.22.53l-2.25 2.25a.749.749 0 0 1-1.275-.326.75.75 0 0 1 .215-.734L5.44 8 3.72 6.28a.749.749 0 0 1 .326-1.275.75.75 0 0 1 .734.215l2.25 2.25c.141.14.22.331.22.53m1.5 1.5h3a.75.75 0 0 1 0 1.5h-3a.75.75 0 0 1 0-1.5"/></svg>');
 }
 
 .md-typeset .admonition.announcement,
@@ -49,6 +51,14 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
 .md-typeset details.important {
   border-color: rgb(239, 85, 82);
 }
+.md-typeset .admonition.code,
+.md-typeset details.code {
+  border-color: #64dd17
+}
+.md-typeset .admonition.console,
+.md-typeset details.console {
+  border-color: #64dd17
+}
 
 .md-typeset .announcement > .admonition-title,
 .md-typeset .announcement > summary {
@@ -58,6 +68,14 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
 .md-typeset .important > summary {
   background-color: rgb(239, 85, 82, 0.1);
 }
+.md-typeset .code > .admonition-title,
+.md-typeset .code > summary {
+  background-color: #64dd171a;
+}
+.md-typeset .console > .admonition-title,
+.md-typeset .console > summary {
+  background-color: #64dd171a;
+}
 
 .md-typeset .announcement > .admonition-title::before,
 .md-typeset .announcement > summary::before {
@@ -71,6 +89,18 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
   -webkit-mask-image: var(--md-admonition-icon--important);
           mask-image: var(--md-admonition-icon--important);
 }
+.md-typeset .code > .admonition-title::before,
+.md-typeset .code > summary::before {
+  background-color: #64dd17;
+  -webkit-mask-image: var(--md-admonition-icon--code);
+          mask-image: var(--md-admonition-icon--code);
+}
+.md-typeset .console > .admonition-title::before,
+.md-typeset .console > summary::before {
+  background-color: #64dd17;
+  -webkit-mask-image: var(--md-admonition-icon--console);
+          mask-image: var(--md-admonition-icon--console);
+}
 
 /* Make label fully visible on hover */
 .md-content__button[href*="edit"]:hover::after {
diff --git a/docs/models/generative_models.md b/docs/models/generative_models.md
index fd5c659921de3..53469245f01b1 100644
--- a/docs/models/generative_models.md
+++ b/docs/models/generative_models.md
@@ -85,7 +85,7 @@ and automatically applies the model's [chat template](https://huggingface.co/doc
     In general, only instruction-tuned models have a chat template.
     Base models may perform poorly as they are not trained to respond to the chat conversation.
 
-??? Code
+??? code
 
     ```python
     from vllm import LLM
diff --git a/docs/models/supported_models.md b/docs/models/supported_models.md
index f427968c8258f..dd9672cc8ab4b 100644
--- a/docs/models/supported_models.md
+++ b/docs/models/supported_models.md
@@ -642,7 +642,7 @@ Specified using `--task generate`.
 
     For the best results, we recommend using the following dependency versions (tested on A10 and L40):
 
-    ??? Dependency versions
+    ??? code "Dependency versions"
 
         ```text
         # Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40)
diff --git a/docs/serving/integrations/langchain.md b/docs/serving/integrations/langchain.md
index 1a24ab29c19c5..4783d4fa0b426 100644
--- a/docs/serving/integrations/langchain.md
+++ b/docs/serving/integrations/langchain.md
@@ -13,7 +13,7 @@ pip install langchain langchain_community -q
 
 To run inference on a single or multiple GPUs, use `VLLM` class from `langchain`.
 
-??? Code
+??? code
 
     ```python
     from langchain_community.llms import VLLM
diff --git a/docs/serving/openai_compatible_server.md b/docs/serving/openai_compatible_server.md
index 5371e45d80529..2d6e064a3fa8c 100644
--- a/docs/serving/openai_compatible_server.md
+++ b/docs/serving/openai_compatible_server.md
@@ -15,7 +15,7 @@ vllm serve NousResearch/Meta-Llama-3-8B-Instruct \
 
 To call the server, in your preferred text editor, create a script that uses an HTTP client. Include any messages that you want to send to the model. Then run that script. Below is an example script using the [official OpenAI Python client](https://github.com/openai/openai-python).
 
-??? Code
+??? code
 
     ```python
     from openai import OpenAI
@@ -146,7 +146,7 @@ completion = client.chat.completions.create(
 Only `X-Request-Id` HTTP request header is supported for now. It can be enabled
 with `--enable-request-id-headers`.
 
-??? Code
+??? code
 
     ```python
     completion = client.chat.completions.create(
@@ -185,7 +185,7 @@ Code example: <gh-file:examples/online_serving/openai_completion_client.py>
 
 The following [sampling parameters][sampling-params] are supported.
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/entrypoints/openai/protocol.py:completion-sampling-params"
@@ -193,7 +193,7 @@ The following [sampling parameters][sampling-params] are supported.
 
 The following extra parameters are supported:
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/entrypoints/openai/protocol.py:completion-extra-params"
@@ -217,7 +217,7 @@ Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py>
 
 The following [sampling parameters][sampling-params] are supported.
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/entrypoints/openai/protocol.py:chat-completion-sampling-params"
@@ -225,7 +225,7 @@ The following [sampling parameters][sampling-params] are supported.
 
 The following extra parameters are supported:
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/entrypoints/openai/protocol.py:chat-completion-extra-params"
@@ -268,7 +268,7 @@ and passing a list of `messages` in the request. Refer to the examples below for
 
     Since the request schema is not defined by OpenAI client, we post a request to the server using the lower-level `requests` library:
 
-    ??? Code
+    ??? code
 
         ```python
         import requests
@@ -327,7 +327,7 @@ The following [pooling parameters][pooling-params] are supported.
 
 The following extra parameters are supported by default:
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/entrypoints/openai/protocol.py:embedding-extra-params"
@@ -335,7 +335,7 @@ The following extra parameters are supported by default:
 
 For chat-like input (i.e. if `messages` is passed), these extra parameters are supported instead:
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/entrypoints/openai/protocol.py:chat-embedding-extra-params"
@@ -358,7 +358,7 @@ Code example: <gh-file:examples/online_serving/openai_transcription_client.py>
 
 The following [sampling parameters][sampling-params] are supported.
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/entrypoints/openai/protocol.py:transcription-sampling-params"
@@ -366,7 +366,7 @@ The following [sampling parameters][sampling-params] are supported.
 
 The following extra parameters are supported:
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/entrypoints/openai/protocol.py:transcription-extra-params"
@@ -446,7 +446,7 @@ curl -v "http://127.0.0.1:8000/classify" \
   }'
 ```
 
-??? Response
+??? console "Response"
 
     ```bash
     {
@@ -494,7 +494,7 @@ curl -v "http://127.0.0.1:8000/classify" \
   }'
 ```
 
-??? Response
+??? console "Response"
 
     ```bash
     {
@@ -564,7 +564,7 @@ curl -X 'POST' \
 }'
 ```
 
-??? Response
+??? console "Response"
 
     ```bash
     {
@@ -589,7 +589,7 @@ You can pass a string to `text_1` and a list to `text_2`, forming multiple sente
 where each pair is built from `text_1` and a string in `text_2`.
 The total number of pairs is `len(text_2)`.
 
-??? Request
+??? console "Request"
 
     ```bash
     curl -X 'POST' \
@@ -606,7 +606,7 @@ The total number of pairs is `len(text_2)`.
     }'
     ```
 
-??? Response
+??? console "Response"
 
     ```bash
     {
@@ -634,7 +634,7 @@ You can pass a list to both `text_1` and `text_2`, forming multiple sentence pai
 where each pair is built from a string in `text_1` and the corresponding string in `text_2` (similar to `zip()`).
 The total number of pairs is `len(text_2)`.
 
-??? Request
+??? console "Request"
 
     ```bash
     curl -X 'POST' \
@@ -655,7 +655,7 @@ The total number of pairs is `len(text_2)`.
     }'
     ```
 
-??? Response
+??? console "Response"
 
     ```bash
     {
@@ -716,7 +716,7 @@ Code example: <gh-file:examples/online_serving/jinaai_rerank_client.py>
 Note that the `top_n` request parameter is optional and will default to the length of the `documents` field.
 Result documents will be sorted by relevance, and the `index` property can be used to determine original order.
 
-??? Request
+??? console "Request"
 
     ```bash
     curl -X 'POST' \
@@ -734,7 +734,7 @@ Result documents will be sorted by relevance, and the `index` property can be us
     }'
     ```
 
-??? Response
+??? console "Response"
 
     ```bash
     {
diff --git a/docs/usage/metrics.md b/docs/usage/metrics.md
index 4350ab5025f5c..fa379003c0b2b 100644
--- a/docs/usage/metrics.md
+++ b/docs/usage/metrics.md
@@ -12,7 +12,7 @@ vllm serve unsloth/Llama-3.2-1B-Instruct
 
 Then query the endpoint to get the latest metrics from the server:
 
-??? Output
+??? console "Output"
 
     ```console
     $ curl http://0.0.0.0:8000/metrics
@@ -33,7 +33,7 @@ Then query the endpoint to get the latest metrics from the server:
 
 The following metrics are exposed:
 
-??? Code
+??? code
 
     ```python
     --8<-- "vllm/engine/metrics.py:metrics-definitions"
diff --git a/docs/usage/troubleshooting.md b/docs/usage/troubleshooting.md
index 2b7abc7f46dff..2d008488ad1eb 100644
--- a/docs/usage/troubleshooting.md
+++ b/docs/usage/troubleshooting.md
@@ -60,7 +60,7 @@ To identify the particular CUDA operation that causes the error, you can add `--
 
 If GPU/CPU communication cannot be established, you can use the following Python script and follow the instructions below to confirm whether the GPU/CPU communication is working correctly.
 
-??? Code
+??? code
 
     ```python
     # Test PyTorch NCCL
@@ -170,7 +170,7 @@ WARNING 12-11 14:50:37 multiproc_worker_utils.py:281] CUDA was previously
 
 or an error from Python that looks like this:
 
-??? Logs
+??? console "Logs"
 
     ```console
     RuntimeError:
@@ -214,7 +214,7 @@ if __name__ == '__main__':
 
 vLLM heavily depends on `torch.compile` to optimize the model for better performance, which introduces the dependency on the `torch.compile` functionality and the `triton` library. By default, we use `torch.compile` to [optimize some functions](gh-pr:10406) in the model. Before running vLLM, you can check if `torch.compile` is working as expected by running the following script:
 
-??? Code
+??? code
 
     ```python
     import torch
diff --git a/docs/usage/usage_stats.md b/docs/usage/usage_stats.md
index 78d2a6784bc5a..e78c67522f61b 100644
--- a/docs/usage/usage_stats.md
+++ b/docs/usage/usage_stats.md
@@ -10,7 +10,7 @@ The list of data collected by the latest version of vLLM can be found here: <gh-
 
 Here is an example as of v0.4.0:
 
-??? Output
+??? console "Output"
 
     ```json
     {