Cody Yu
aa48e502fb
[MISC] Upgrade dependency to PyTorch 2.3.1 ( #5327 )
2024-07-12 12:04:26 -07:00
Cyrus Leung
9d47f64eb6
[CI/Build] [3/3] Reorganize entrypoints tests ( #5966 )
2024-06-30 12:58:49 +08:00
Roger Wang
4ad7b53e59
[CI/Build][Misc] Update Pytest Marker for VLMs ( #5623 )
2024-06-18 13:10:04 +00:00
Cyrus Leung
89c920785f
[CI/Build] Update vision tests ( #5307 )
2024-06-06 05:17:18 -05:00
Tyler Michael Smith
260d119e86
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU ( #5137 )
2024-06-01 06:45:32 +00:00
Cyrus Leung
5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines ( #4328 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Michael Goin
757b62c495
[CI/Build] Codespell ignore build/ directory ( #4945 )
2024-05-21 09:06:10 -07:00
SangBin Cho
2e9a2227ec
[Lora] Support long context lora ( #4787 )
...
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.
It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.
Follow up of https://github.com/vllm-project/vllm/pull/3095/files
2024-05-18 16:05:23 +09:00
Michael Goin
d627a3d837
[Misc] Upgrade to torch==2.3.0 ( #4454 )
2024-04-29 20:05:47 -04:00
SangBin Cho
a88081bf76
[CI] Disable non-lazy string operation on logging ( #4326 )
...
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
SangBin Cho
0ae11f78ab
[Mypy] Part 3 fix typing for nested directories for most of directory ( #4161 )
2024-04-22 21:32:44 -07:00
SangBin Cho
09473ee41c
[mypy] Add mypy type annotation part 1 ( #4006 )
2024-04-12 14:35:50 -07:00
youkaichao
ca81ff5196
[Core] manage nccl via a pypi package & upgrade to pt 2.2.1 ( #3805 )
2024-04-04 10:26:19 -07:00
Adrian Abeyta
2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
...
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
Roger Wang
45b6ef6513
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark ( #3277 )
2024-03-27 13:39:26 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
bnellnm
9fdf3de346
Cmake based build system ( #2830 )
2024-03-18 15:38:33 -07:00
Ronen Schaffer
14e3f9a1b2
Replace lstrip() with removeprefix() to fix Ruff linter warning ( #2958 )
2024-03-15 21:01:30 -07:00
Zhuohan Li
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
Massimiliano Pronesti
93dc5a2870
chore(vllm): codespell for spell checking ( #2820 )
2024-02-21 18:56:01 -08:00
Woosuk Kwon
b0a1d667b0
Pin PyTorch & xformers versions ( #2155 )
2023-12-17 01:46:54 -08:00
Woosuk Kwon
f3e024bece
[CI/CD] Upgrade PyTorch version to v2.1.1 ( #2045 )
2023-12-11 17:48:11 -08:00
Allen
f07c1ceaa5
[FIX] Fix docker build error ( #1831 ) ( #1832 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2023-11-29 23:06:50 -08:00
Simon Mo
5ffc0d13a2
Migrate linter from pylint to ruff ( #1665 )
2023-11-20 11:58:01 -08:00
Zhuohan Li
06458a0b42
Upgrade to CUDA 12 ( #1527 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-11-08 14:17:49 -08:00
yanxiyue
6a6119554c
lock torch version to 2.0.1 ( #1290 )
2023-10-10 09:21:57 -07:00
Woosuk Kwon
376725ce74
[PyPI] Packaging for PyPI distribution ( #140 )
2023-06-05 20:03:14 -07:00