vllm/v1 at 4716377fbc1887f27732b3816bd010a6809e41bc - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 07:57:11 +08:00

History

rongfu.leng 4716377fbc

[Feature] Estimate max-model-len use available KV cache memory (#16168 )

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

2025-04-08 19:12:51 -07:00

..

[Feature] Estimate max-model-len use available KV cache memory (#16168 )

2025-04-08 19:12:51 -07:00

[V1] Implement sliding window attention in kv_cache_manager (#14097 )

2025-04-01 00:33:17 -07:00

[V1][BugFix] Exit properly if engine core fails during startup (#16137 )

2025-04-07 15:30:15 -07:00

[V1] Fix json_object support with xgrammar (#15488 )

2025-04-02 02:00:08 -07:00

[V1][Sampler] Faster top-k only implementation (#15478 )

2025-03-26 10:56:47 -07:00

[V1][Spec Decode] Respect prompt_lookup_max (#15348 )

2025-03-23 10:41:44 -07:00

structured_output

[CI] xgrammar structured output supports Enum. (#15757 )

2025-03-29 20:20:02 -07:00

[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732 )

2025-04-03 14:23:28 -07:00

[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250 )

2025-03-20 17:50:43 -07:00

__init__.py

[V1] AsyncLLM Implementation (#9826 )

2024-11-11 23:05:38 +00:00

test_async_llm_dp.py

[V1] AsyncLLM data parallel (#13923 )

2025-03-27 16:14:41 -07:00

test_oracle.py

[Misc] Enable V1 LoRA by default (#15320 )

2025-04-01 16:53:56 +08:00

test_stats.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

test_utils.py

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00