This website requires JavaScript.
Explore
Help
Sign In
xinyun
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced
2026-03-19 01:57:09 +08:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
vllm
/
tests
/
quantization
History
Cyrus Leung
6ffa3f314c
[CI/Build] Avoid CUDA initialization (
#8534
)
2024-09-18 10:38:11 +00:00
..
__init__.py
…
test_bitsandbytes.py
[Feature][kernel] tensor parallelism with bitsandbytes quantization (
#8434
)
2024-09-17 08:09:12 -07:00
test_compressed_tensors.py
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (
#7257
)
2024-09-11 09:46:46 -07:00
test_configs.py
[Kernel][Core] Add AWQ support to the Marlin kernel (
#6612
)
2024-07-21 19:41:42 -04:00
test_cpu_offload.py
[ci][test] adjust max wait time for cpu offloading test (
#7709
)
2024-08-20 17:12:44 -07:00
test_experts_int8.py
[Kernel] W8A16 Int8 inside FusedMoE (
#7415
)
2024-08-16 10:06:51 -07:00
test_fp8.py
[CI/Build] Avoid CUDA initialization (
#8534
)
2024-09-18 10:38:11 +00:00
test_lm_head.py
[Core] Support loading GGUF model (
#5191
)
2024-08-05 17:54:23 -06:00
utils.py
[CI/Build] Avoid CUDA initialization (
#8534
)
2024-09-18 10:38:11 +00:00