This website requires JavaScript.
Explore
Help
Sign In
xinyun
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced
2026-06-14 17:17:25 +08:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
vllm
/
vllm
/
model_executor
/
layers
/
quantization
/
compressed_tensors
History
Rahul Tuli
cbe94391eb
Fix: cases with empty sparsity config (
#12057
)
...
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
2025-01-15 17:41:24 +08:00
..
schemes
[TPU][Quantization] TPU
W8A8
(
#11785
)
2025-01-08 19:33:29 +00:00
__init__.py
[Kernel] Initial Activation Quantization Support (
#4525
)
2024-05-23 21:29:18 +00:00
compressed_tensors_moe.py
[Misc] Move
print_*_once
from utils to logger (
#11298
)
2025-01-09 12:48:12 +08:00
compressed_tensors.py
Fix: cases with empty sparsity config (
#12057
)
2025-01-15 17:41:24 +08:00
triton_scaled_mm.py
[Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (
#11698
)
2025-01-08 20:23:15 +00:00
utils.py
[Bugfix] update should_ignore_layer (
#11354
)
2024-12-21 06:36:23 +00:00