vllm/layers at f780504d1294cbe28221d9d030b040384fa53d5d - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-02 19:51:18 +08:00

History

Chenhui Zhang f780504d12

fix weigit loading for GQA with TP (#2379 )

2024-01-15 15:43:59 -08:00

..

Add GPTQ support (#916 )

2023-12-15 03:04:22 -08:00

__init__.py

Change the name to vLLM (#150 )

2023-06-17 03:07:40 -07:00

activation.py

Add PyTorch-native implementation of custom layers (#1898 )

2023-12-02 21:18:40 -08:00

attention.py

[Minor] Remove unused code in attention (#2384 )

2024-01-08 13:13:08 -08:00

layernorm.py

Add PyTorch-native implementation of custom layers (#1898 )

2023-12-02 21:18:40 -08:00

linear.py

fix weigit loading for GQA with TP (#2379 )

2024-01-15 15:43:59 -08:00

rejection_sampler.py

[Speculative decoding 1/9] Optimized rejection sampler (#2336 )

2024-01-09 15:38:41 -08:00

rotary_embedding.py

Add PyTorch-native implementation of custom layers (#1898 )

2023-12-02 21:18:40 -08:00

sampler.py

Aligning top_p and top_k Sampling (#1885 )

2024-01-12 22:51:03 +01:00

vocab_parallel_embedding.py

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )

2023-11-15 22:50:41 -08:00