Simon Mo
|
aceb17cf2d
|
[Docs] document that mixtral 8x22b is supported (#4073)
|
2024-04-14 14:35:55 -07:00 |
|
Nick Hill
|
563c54f760
|
[BugFix] Fix tensorizer extra in setup.py (#4072)
|
2024-04-14 14:12:42 -07:00 |
|
youkaichao
|
2cd6b4f362
|
[Core] avoid too many cuda context by caching p2p test (#4021)
|
2024-04-13 23:40:21 -07:00 |
|
Sanger Steel
|
711a000255
|
[Frontend] [Core] feat: Add model loading using tensorizer (#3476)
|
2024-04-13 17:13:01 -07:00 |
|
Jee Li
|
989ae2538d
|
[Kernel] Add punica dimension for Baichuan-13B (#4053)
|
2024-04-13 07:55:05 -07:00 |
|
zspo
|
0a430b4ae2
|
[Bugfix] fix_small_bug_in_neuron_executor (#4051)
|
2024-04-13 07:54:03 -07:00 |
|
zspo
|
ec8e3c695f
|
[Bugfix] fix_log_time_in_metrics (#4050)
|
2024-04-13 07:52:36 -07:00 |
|
youkaichao
|
98afde19fc
|
[Core][Distributed] improve logging for init dist (#4042)
|
2024-04-13 07:12:53 -07:00 |
|
Dylan Hawk
|
5c2e66e487
|
[Bugfix] More type hint fixes for py 3.8 (#4039)
|
2024-04-12 21:07:04 -07:00 |
|
youkaichao
|
546e721168
|
[CI/Test] expand ruff and yapf for all supported python version (#4037)
|
2024-04-13 01:43:37 +00:00 |
|
Jee Li
|
b8aacac31a
|
[Bugfix] Fix LoRA bug (#4032)
|
2024-04-12 16:56:37 -07:00 |
|
Bellk17
|
d04973ad54
|
Fix triton compilation issue (#3984)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-04-12 16:41:26 -07:00 |
|
youkaichao
|
fbb9d9eef4
|
[Core] fix custom allreduce default value (#4040)
|
2024-04-12 16:40:39 -07:00 |
|
SangBin Cho
|
09473ee41c
|
[mypy] Add mypy type annotation part 1 (#4006)
|
2024-04-12 14:35:50 -07:00 |
|
Zhuohan Li
|
d4ec9ffb95
|
[Misc] Fix typo in scheduler.py (#4022)
|
2024-04-12 13:56:04 -07:00 |
|
youkaichao
|
96b6a6d790
|
[Bugfix] fix type hint for py 3.8 (#4036)
|
2024-04-12 19:35:44 +00:00 |
|
SangBin Cho
|
36729bac13
|
[Test] Test multiple attn backend for chunked prefill. (#4023)
|
2024-04-12 09:56:57 -07:00 |
|
Cyrus Leung
|
7fd3949a0b
|
[Frontend][Core] Move merge_async_iterators to utils (#4026)
|
2024-04-12 05:30:54 +00:00 |
|
Jee Li
|
1096717ae9
|
[Core] Support LoRA on quantized models (#4012)
|
2024-04-11 21:02:44 -07:00 |
|
Michael Feil
|
c2b4a1bce9
|
[Doc] Add typing hints / mypy types cleanup (#3816)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-04-11 17:17:21 -07:00 |
|
Nick Hill
|
e46a60aa4c
|
[BugFix] Fix handling of stop strings and stop token ids (#3672)
|
2024-04-11 15:34:12 -07:00 |
|
Antoni Baum
|
1e96c3341a
|
Add extra punica sizes to support bigger vocabs (#4015)
|
2024-04-11 22:18:57 +00:00 |
|
Dylan Hawk
|
95e7d4a97c
|
Fix echo/logprob OpenAI completion bug (#3441)
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>
|
2024-04-11 22:15:50 +00:00 |
|
youkaichao
|
559eb852f8
|
[Core] init_distributed_environment align with init_process_group(#4014)
[Core][Distributed] make init_distributed_environment compatible with init_process_group (#4014)
|
2024-04-11 14:00:48 -07:00 |
|
Antoni Baum
|
a10d3056da
|
[Core] Set linear_weights directly on the layer (#3977)
|
2024-04-11 16:35:51 -04:00 |
|
bigPYJ1151
|
8afca50889
|
[Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance (#3824)
|
2024-04-11 11:56:49 -07:00 |
|
fuchen.ljl
|
08ccee1e83
|
punica fix-bgmv-kernel-640 (#4007)
|
2024-04-11 08:59:26 -07:00 |
|
Roger Wang
|
c1dc547129
|
[Kernel] Fused MoE Config for Mixtral 8x22 (#4002)
|
2024-04-11 07:50:00 -07:00 |
|
youkaichao
|
f3d0bf7589
|
[Doc][Installation] delete python setup.py develop (#3989)
|
2024-04-11 03:33:02 +00:00 |
|
Kunshang Ji
|
e9da5a40c6
|
[Misc] Add indirection layer for custom ops (#3913)
|
2024-04-10 20:26:07 -07:00 |
|
SangBin Cho
|
e42df7227d
|
[Test] Add xformer and flash attn tests (#3961)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-11 03:09:50 +00:00 |
|
youkaichao
|
caada5e50a
|
[Core][Model] torch.compile for layernorm in commandr (#3985)
[Core][Model] Use torch.compile to accelerate layernorm in commandr (#3985)
|
2024-04-11 01:48:26 +00:00 |
|
SangBin Cho
|
67b4221a61
|
[Core][5/N] Fully working chunked prefill e2e (#3884)
|
2024-04-10 17:56:48 -07:00 |
|
youkaichao
|
63e7176f26
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
Travis Johnson
|
934d3662f7
|
[Bugfix] handle hf_config with architectures == None (#3982)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-10 22:28:25 +00:00 |
|
Frαnçois
|
92cd2e2f21
|
[Doc] Fix getting stared to use publicly available model (#3963)
|
2024-04-10 18:05:52 +00:00 |
|
Daniel E Marasco
|
e4c4072c94
|
[Bugfix] Remove key sorting for guided_json parameter in OpenAi compatible Server (#3945)
|
2024-04-10 10:15:51 -07:00 |
|
youkaichao
|
e35397468f
|
[Doc] Add doc to state our model support policy (#3948)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-04-10 17:03:02 +00:00 |
|
James Whedbee
|
8b317c6dd0
|
[Model][AMD] ROCm support for 256 head dims for Gemma (#3972)
|
2024-04-10 08:12:00 -07:00 |
|
Woosuk Kwon
|
bd3c144e0b
|
[Bugfix][ROCm] Add numba to Dockerfile.rocm (#3962)
|
2024-04-10 07:37:17 -07:00 |
|
Travis Johnson
|
0258b7a94b
|
[Bugfix] handle prompt_logprobs in _apply_min_tokens_penalty (#3876)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-04-10 01:39:56 -07:00 |
|
胡译文
|
b3104b2a10
|
[Bugfix] Fix logits processor when prompt_logprobs is not None (#3899)
|
2024-04-10 00:09:36 -07:00 |
|
zhaotyer
|
c2e00af523
|
[Bugfix] fix utils.py/merge_dict func TypeError: 'type' object is not subscriptable (#3955)
Co-authored-by: tianyi_zhao <tianyi.zhao@transwarp.io>
|
2024-04-10 04:49:11 +00:00 |
|
Zedong Peng
|
c013d32c75
|
[Benchmark] Add cpu options to bench scripts (#3915)
|
2024-04-09 21:30:03 -07:00 |
|
Jee Li
|
11dd6ebb89
|
[Misc] Avoid loading incorrect LoRA config (#3777)
|
2024-04-09 19:47:15 -07:00 |
|
Juan Villamizar
|
6c0b04515f
|
[ROCm][Hardware][AMD] Use Triton Kernel for default FA on ROCm (#3643)
Co-authored-by: jpvillam <jpvillam@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-04-09 15:10:47 -07:00 |
|
Junichi Sato
|
e23a43aef8
|
[Bugfix] Fix KeyError on loading GPT-NeoX (#3925)
|
2024-04-09 12:11:31 -07:00 |
|
Cade Daniel
|
e7c7067b45
|
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
|
2024-04-09 11:44:15 -07:00 |
|
youkaichao
|
6d592eb430
|
[Core] separate distributed_init from worker (#3904)
|
2024-04-09 08:49:02 +00:00 |
|
Roy
|
d036198e23
|
[BugFix][Model] Fix commandr RoPE max_position_embeddings (#3919)
|
2024-04-09 06:17:21 +08:00 |
|