vllm/distributed at 4ff61ababa25f4a519185013c9cce00142341f04 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-09 19:27:20 +08:00

History

Woosuk Kwon 7f280d69c9

[Optimization] Cache sampled token ids in model runner (#20291 )

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

2025-07-01 11:01:31 -07:00

..

device_communicators

[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx (#20187 )

2025-06-28 22:06:38 +00:00

[Feature] Expert Parallelism Load Balancer (EPLB) (#18343 )

2025-06-26 15:30:21 -07:00

[Optimization] Cache sampled token ids in model runner (#20291 )

2025-07-01 11:01:31 -07:00

__init__.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

communication_op.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

kv_events.py

feat: add data parallel rank to KVEventBatch (#18925 )

2025-06-03 17:14:20 -07:00

parallel_state.py

[V1] Only print cudagraph tqdm on rank 0 with is_global_first_rank (#19516 )

2025-07-01 06:02:09 +00:00

tpu_distributed_utils.py

[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 )

2025-06-03 00:06:20 +00:00

utils.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00