vllm/layers at 8c654c045f73198a517becd8b1b23a9b16eae284 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-27 06:17:10 +08:00

History

Enrique Shockwave b983ba35bd

fix marlin config repr (#3414 )

2024-03-14 16:26:19 -07:00

..

Re-enable the 80 char line width limit (#3305 )

2024-03-10 19:49:14 -07:00

[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389 )

2024-03-14 08:11:48 +00:00

fix marlin config repr (#3414 )

2024-03-14 16:26:19 -07:00

__init__.py

Change the name to vLLM (#150 )

2023-06-17 03:07:40 -07:00

activation.py

Add kernel for GeGLU with approximate GELU (#3337 )

2024-03-12 22:06:17 -07:00

layernorm.py

Revert "Refactor llama family models (#2637 )" (#2851 )

2024-02-13 09:24:59 -08:00

linear.py

[Minor] Fix bias in if to remove ambiguity (#3259 )

2024-03-13 09:16:55 -07:00

rejection_sampler.py

[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling (#3103 )

2024-03-08 23:32:46 -08:00

rotary_embedding.py

Fix lint (#3388 )

2024-03-13 13:56:49 -07:00

sampler.py

Re-enable the 80 char line width limit (#3305 )

2024-03-10 19:49:14 -07:00

vocab_parallel_embedding.py

Remove hardcoded device="cuda" to support more devices (#2503 )

2024-02-01 15:46:39 -08:00