vllm/v1 at f34eca5f0141088fef5b81a933f9869e1a04f188 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-23 09:51:20 +08:00

History

Roger Wang 16484d394c [Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475 )

Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Sun Kim <sunytokki@gmail.com>
(cherry picked from commit f5f51e5931ffd99afe69696b60765b88d3eb13f2)

2025-12-16 17:15:49 -08:00

attention

[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532 )

2025-12-12 05:57:47 -08:00

core

[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475 )

2025-12-16 17:15:49 -08:00

cudagraph

[BugFix] Fix assert batch_descriptor.num_tokens == num_tokens_padded (#30173 )

2025-12-09 10:36:12 -05:00

determinism

[Feature] Batch-Invariant Support for FA2 and LoRA (#30018 )

2025-12-09 10:01:38 -05:00

distributed

[CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm (#30417 )

2025-12-12 00:24:20 +00:00

e2e

[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527 )

2025-12-12 03:54:56 +00:00

ec_connector

[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475 )

2025-12-16 17:15:49 -08:00

engine

[Bugfix] fix confusing OOM errors during v1 init (#28051 )

2025-12-10 23:17:41 +00:00

entrypoints

fix: Update json features supported by xGrammar (#30390 )

2025-12-14 02:16:06 -08:00

executor

[Core] Simplify async KV output aggregation (#28327 )

2025-11-09 09:44:13 -08:00

kv_connector

[NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 (#30420 )

2025-12-14 15:38:28 +00:00

kv_offload

CPU KV Offloading: Use more CUDA streams (#29013 )

2025-12-14 23:50:45 +00:00

logits_processors

[ROCm][CI] Fix v1/logits_processors failure on ROCm (#29927 )

2025-12-04 01:17:07 +08:00

metrics

feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189 )

2025-12-09 00:08:48 +00:00

sample

[bugfix] fix bug when top_logprobs=0 with spec decoding (#30059 )

2025-12-12 09:03:35 -08:00

shutdown

[Chore] Clean up pytorch helper functions in vllm.utils (#26908 )

2025-10-18 09:48:22 -07:00

spec_decode

[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 )

2025-12-09 17:18:10 -08:00

structured_output

fix: Update json features supported by xGrammar (#30390 )

2025-12-14 02:16:06 -08:00

tpu

Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145 )" (#30199 )

2025-12-07 00:00:22 -08:00

tracing

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

worker

[Cleanup] Refactor profiling env vars into a CLI config (#29912 )

2025-12-09 13:29:33 -05:00

__init__.py

[V1] AsyncLLM Implementation (#9826 )

2024-11-11 23:05:38 +00:00

test_oracle.py

[V0 deprecation]clean up is_v1_supported_oracle (#28116 )

2025-11-06 16:05:32 +08:00

test_outputs.py

[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542 )

2025-11-28 20:52:23 +08:00

test_request.py

[Misc] Add SPDX-FileCopyrightText (#20428 )

2025-07-04 07:40:42 +00:00

test_serial_utils.py

[Model] Move multimodal_cpu_fields definition to field config (#30181 )

2025-12-06 13:40:02 +00:00

utils.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00