vllm/model_executor at 75d29cf4e1d7e950c2308b12e944b507fb3e1916 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-27 11:07:04 +08:00

History

Wentao Ye 75d29cf4e1

[Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476 )

Signed-off-by: yewentao256 <zhyanwentao@126.com>

2025-07-25 17:07:07 -07:00

..

guided_decoding

[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975 )

2025-07-10 15:30:26 -04:00

[Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476 )

2025-07-25 17:07:07 -07:00

[Bugfix] fix modelscope snapshot_download serialization (#21536 )

2025-07-24 22:44:38 -07:00

Add support for Prithvi in Online serving mode (#21518 )

2025-07-25 07:01:27 -07:00

__init__.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

custom_op.py

[V0 deprecation] Remove V0 HPU backend (#21131 )

2025-07-17 16:37:36 -07:00

parameter.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

pooling_metadata.py

[Model][1/N] Support multiple poolers at model level (#21227 )

2025-07-21 02:22:21 -07:00

sampling_metadata.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

utils.py

[Quant] [Bugfix] Fix quantization config matching with hf_to_vllm_mapper (#20046 )

2025-07-01 19:20:34 +09:00