vllm/spec_decode at 186352b2703652141df75bc2c012a784706e8572 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-15 13:47:14 +08:00

History

Jialin Ouyang 186352b270

[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 )

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

2025-11-14 16:04:04 -08:00

..

test_eagle.py

[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 )

2025-11-14 16:04:04 -08:00

test_max_len.py

[Bugfix] Spec decode + structured output + spec model max len edge case (#28298 )

2025-11-08 19:44:25 +00:00

test_mtp.py

[Attention] Refactor CUDA attention backend selection logic (#24794 )

2025-11-11 07:40:44 -05:00

test_ngram.py

[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 )

2025-11-14 16:04:04 -08:00

test_speculators_eagle3.py

[Speculators] Move tests + fix integration (#27308 )

2025-10-29 00:54:21 -07:00

test_tree_attention.py

[Attention] Refactor CUDA attention backend selection logic (#24794 )

2025-11-11 07:40:44 -05:00