vllm/spec_decode at b5945d49c08b66658110fa1c63e55fde66fcfad7 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-28 13:47:09 +08:00

History

Lucas Wilkinson abe93bce59

[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 )

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>

2025-12-09 17:18:10 -08:00

test_eagle.py

[ROCm][CI] Fix test_max_len.py for Rocm (#29916 )

2025-12-08 16:58:30 -05:00

test_max_len.py

[CI] Fix Flaky test_eagle_max_len Test (#30306 )

2025-12-09 07:33:34 +00:00

test_mtp.py

Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145 )" (#30199 )

2025-12-07 00:00:22 -08:00

test_ngram.py

Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145 )" (#30199 )

2025-12-07 00:00:22 -08:00

test_speculators_eagle3.py

[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model (#30001 )

2025-12-04 07:52:28 +00:00

test_tree_attention.py

[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 )

2025-12-09 17:18:10 -08:00