vllm/model_executor at 85df8afdae72866f5142d674a916ec6a0879b9e8 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-26 01:17:16 +08:00

History

Tao He e93f4cc9e3

Add the support for the qwen3 next model (a hybrid attention model). (#24526 )

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

2025-09-11 15:32:09 +08:00

..

Add the support for the qwen3 next model (a hybrid attention model). (#24526 )

2025-09-11 15:32:09 +08:00

[Core] feat: Add --safetensors-load-strategy flag for faster safetensors loading from Lustre (#24469 )

2025-09-10 23:10:01 -07:00

Add the support for the qwen3 next model (a hybrid attention model). (#24526 )

2025-09-11 15:32:09 +08:00

[Perf] Warmup FlashInfer attention during startup (#23439 )

2025-09-10 15:03:17 -07:00

__init__.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

custom_op.py

[V0 deprecation] Deprecate V0 Neuron backend (#21159 )

2025-09-06 16:15:18 -07:00

parameter.py

[Core] Allow disabling TP sharding for parallel Linear layer (#23024 )

2025-09-05 22:53:58 -07:00

sampling_metadata.py

[Doc]: fix typos in Python comments (#24042 )

2025-09-01 19:07:45 -07:00

utils.py

[Quantization] Enable BNB support for InternS1 (#21953 )

2025-08-01 11:09:54 +00:00