vllm/v1 at ffb740ae95518450c533dda4d614b6d24701a96e - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-30 01:23:31 +08:00

History

Lucas Wilkinson ffb740ae95 manually manage stream

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

2025-05-22 20:51:36 +00:00

attention

support MLA

2025-05-22 20:51:35 +00:00

core

[BugFix] Fix handling of num_computed_tokens with connector (#18232 )

2025-05-19 09:03:25 -07:00

engine

[KVConnector] Keep KVTransferParams as a dict (#18033 )

2025-05-14 08:05:57 -07:00

executor

[BugFix] Avoid secondary missing MultiprocExecutor.workers error (#17811 )

2025-05-07 21:55:04 +00:00

metrics

[Misc] Add Ray Prometheus logger to V1 (#17925 )

2025-05-16 01:02:42 -07:00

sample

[Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777 )

2025-05-16 15:14:03 -07:00

spec_decode

[Misc] Add Ray Prometheus logger to V1 (#17925 )

2025-05-16 01:02:42 -07:00

structured_output

[V1] Structured Outputs + Thinking compatibility (#16577 )

2025-05-14 15:45:24 -07:00

worker

manually manage stream

2025-05-22 20:51:36 +00:00

__init__.py

[V1] AsyncLLM Implementation (#9826 )

2024-11-11 23:05:38 +00:00

kv_cache_interface.py

[v1] Support multiple KV cache groups in GPU model runner (#17945 )

2025-05-14 18:54:54 -07:00

outputs.py

[P/D] NIXL Integration (#17751 )

2025-05-12 09:46:16 -07:00

request.py

fix: typos (#18151 )

2025-05-15 02:16:15 -07:00

serial_utils.py

[V1] Improve VLLM_ALLOW_INSECURE_SERIALIZATION logging (#17860 )

2025-05-08 16:57:35 +00:00

utils.py

[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 )

2025-05-13 10:48:21 -07:00