vllm/platforms at c20ef40fd0e8663e82911f53d00a64f53beb98aa - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-04 02:17:51 +08:00

History

Akshat Tripathi c20ef40fd0

[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 )

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>

2025-05-07 16:28:47 -04:00

__init__.py

Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 )

2025-05-07 00:07:30 -07:00

cpu.py

Add full API docs and improve the UX of navigating them (#17485 )

2025-05-03 19:42:43 -07:00

cuda.py

Add full API docs and improve the UX of navigating them (#17485 )

2025-05-03 19:42:43 -07:00

hpu.py

[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779 )

2025-04-11 07:38:36 -07:00

interface.py

[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 )

2025-05-07 16:28:47 -04:00

neuron.py

Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 )

2025-05-07 00:07:30 -07:00

rocm.py

[Quantization] Quark MXFP4 format loading (#16943 )

2025-05-07 15:05:05 -04:00

tpu.py

[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 )

2025-05-07 16:28:47 -04:00

xpu.py

[Hardware] add platform-specific request validation api (#16291 )

2025-04-09 12:50:01 -07:00