Nick Hill
|
e46a60aa4c
|
[BugFix] Fix handling of stop strings and stop token ids (#3672)
|
2024-04-11 15:34:12 -07:00 |
|
Antoni Baum
|
1e96c3341a
|
Add extra punica sizes to support bigger vocabs (#4015)
|
2024-04-11 22:18:57 +00:00 |
|
Dylan Hawk
|
95e7d4a97c
|
Fix echo/logprob OpenAI completion bug (#3441)
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>
|
2024-04-11 22:15:50 +00:00 |
|
youkaichao
|
559eb852f8
|
[Core] init_distributed_environment align with init_process_group(#4014)
[Core][Distributed] make init_distributed_environment compatible with init_process_group (#4014)
|
2024-04-11 14:00:48 -07:00 |
|
Antoni Baum
|
a10d3056da
|
[Core] Set linear_weights directly on the layer (#3977)
|
2024-04-11 16:35:51 -04:00 |
|
bigPYJ1151
|
8afca50889
|
[Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance (#3824)
|
2024-04-11 11:56:49 -07:00 |
|
fuchen.ljl
|
08ccee1e83
|
punica fix-bgmv-kernel-640 (#4007)
|
2024-04-11 08:59:26 -07:00 |
|
Roger Wang
|
c1dc547129
|
[Kernel] Fused MoE Config for Mixtral 8x22 (#4002)
|
2024-04-11 07:50:00 -07:00 |
|
youkaichao
|
f3d0bf7589
|
[Doc][Installation] delete python setup.py develop (#3989)
|
2024-04-11 03:33:02 +00:00 |
|
Kunshang Ji
|
e9da5a40c6
|
[Misc] Add indirection layer for custom ops (#3913)
|
2024-04-10 20:26:07 -07:00 |
|
SangBin Cho
|
e42df7227d
|
[Test] Add xformer and flash attn tests (#3961)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-11 03:09:50 +00:00 |
|
youkaichao
|
caada5e50a
|
[Core][Model] torch.compile for layernorm in commandr (#3985)
[Core][Model] Use torch.compile to accelerate layernorm in commandr (#3985)
|
2024-04-11 01:48:26 +00:00 |
|
SangBin Cho
|
67b4221a61
|
[Core][5/N] Fully working chunked prefill e2e (#3884)
|
2024-04-10 17:56:48 -07:00 |
|
youkaichao
|
63e7176f26
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
Travis Johnson
|
934d3662f7
|
[Bugfix] handle hf_config with architectures == None (#3982)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-10 22:28:25 +00:00 |
|
Frαnçois
|
92cd2e2f21
|
[Doc] Fix getting stared to use publicly available model (#3963)
|
2024-04-10 18:05:52 +00:00 |
|
Daniel E Marasco
|
e4c4072c94
|
[Bugfix] Remove key sorting for guided_json parameter in OpenAi compatible Server (#3945)
|
2024-04-10 10:15:51 -07:00 |
|
youkaichao
|
e35397468f
|
[Doc] Add doc to state our model support policy (#3948)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-04-10 17:03:02 +00:00 |
|
James Whedbee
|
8b317c6dd0
|
[Model][AMD] ROCm support for 256 head dims for Gemma (#3972)
|
2024-04-10 08:12:00 -07:00 |
|
Woosuk Kwon
|
bd3c144e0b
|
[Bugfix][ROCm] Add numba to Dockerfile.rocm (#3962)
|
2024-04-10 07:37:17 -07:00 |
|
Travis Johnson
|
0258b7a94b
|
[Bugfix] handle prompt_logprobs in _apply_min_tokens_penalty (#3876)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-04-10 01:39:56 -07:00 |
|
胡译文
|
b3104b2a10
|
[Bugfix] Fix logits processor when prompt_logprobs is not None (#3899)
|
2024-04-10 00:09:36 -07:00 |
|
zhaotyer
|
c2e00af523
|
[Bugfix] fix utils.py/merge_dict func TypeError: 'type' object is not subscriptable (#3955)
Co-authored-by: tianyi_zhao <tianyi.zhao@transwarp.io>
|
2024-04-10 04:49:11 +00:00 |
|
Zedong Peng
|
c013d32c75
|
[Benchmark] Add cpu options to bench scripts (#3915)
|
2024-04-09 21:30:03 -07:00 |
|
Jee Li
|
11dd6ebb89
|
[Misc] Avoid loading incorrect LoRA config (#3777)
|
2024-04-09 19:47:15 -07:00 |
|
Juan Villamizar
|
6c0b04515f
|
[ROCm][Hardware][AMD] Use Triton Kernel for default FA on ROCm (#3643)
Co-authored-by: jpvillam <jpvillam@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-04-09 15:10:47 -07:00 |
|
Junichi Sato
|
e23a43aef8
|
[Bugfix] Fix KeyError on loading GPT-NeoX (#3925)
|
2024-04-09 12:11:31 -07:00 |
|
Cade Daniel
|
e7c7067b45
|
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
|
2024-04-09 11:44:15 -07:00 |
|
youkaichao
|
6d592eb430
|
[Core] separate distributed_init from worker (#3904)
|
2024-04-09 08:49:02 +00:00 |
|
Roy
|
d036198e23
|
[BugFix][Model] Fix commandr RoPE max_position_embeddings (#3919)
|
2024-04-09 06:17:21 +08:00 |
|
Matt Wong
|
59a6abf3c9
|
[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782)
|
2024-04-08 14:31:02 -07:00 |
|
Kiran R
|
bc0c0192d1
|
[Bugfix] Enable Proper attention_bias Usage in Llama Model Configuration (#3767)
Co-authored-by: roy <jasonailu87@gmail.com>
|
2024-04-08 19:42:35 +00:00 |
|
egortolmachev
|
f46864d68d
|
[Bugfix] Added Command-R GPTQ support (#3849)
Co-authored-by: Egor Tolmachev <t333ga@gmail.com>
|
2024-04-08 14:59:38 +00:00 |
|
ywfang
|
b4543c8f6b
|
[Model] add minicpm (#3893)
|
2024-04-08 18:28:36 +08:00 |
|
Isotr0py
|
0ce0539d47
|
[Bugfix] Fix Llava inference with Tensor Parallelism. (#3883)
|
2024-04-07 22:54:13 +08:00 |
|
youkaichao
|
2f19283549
|
[Core] latency optimization (#3890)
|
2024-04-06 19:14:06 -07:00 |
|
youkaichao
|
95baec828f
|
[Core] enable out-of-tree model register (#3871)
|
2024-04-06 17:11:41 -07:00 |
|
youkaichao
|
e4be7d70bb
|
[CI/Benchmark] add more iteration and use median for robust latency benchmark (#3889)
|
2024-04-06 21:32:30 +00:00 |
|
Isotr0py
|
54951ac4bf
|
[Bugfix] Fix incorrect output on OLMo models in Tensor Parallelism (#3869)
|
2024-04-05 12:02:09 -07:00 |
|
SangBin Cho
|
18de883489
|
[Chunked Prefill][4/n] Chunked prefill scheduler. (#3853)
|
2024-04-05 10:17:58 -07:00 |
|
Thomas Parnell
|
1d7c940d74
|
Add option to completion API to truncate prompt tokens (#3144)
|
2024-04-05 10:15:42 -07:00 |
|
Woosuk Kwon
|
cfaf49a167
|
[Misc] Define common requirements (#3841)
|
2024-04-05 00:39:17 -07:00 |
|
Noam Gat
|
9edec652e2
|
[Bugfix] Fixing requirements.txt (#3865)
|
2024-04-04 23:46:01 -07:00 |
|
Cade Daniel
|
e0dd4d3589
|
[Misc] Fix linter issues in examples/fp8/quantizer/quantize.py (#3864)
|
2024-04-04 21:57:33 -07:00 |
|
Cade Daniel
|
e5043a3e75
|
[Misc] Add pytest marker to opt-out of global test cleanup (#3863)
|
2024-04-04 21:54:16 -07:00 |
|
youkaichao
|
d03d64fd2e
|
[CI/Build] refactor dockerfile & fix pip cache
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859)
|
2024-04-04 21:53:16 -07:00 |
|
Sean Gallen
|
78107fa091
|
[Doc]Add asynchronous engine arguments to documentation. (#3810)
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-04-04 21:52:01 -07:00 |
|
youkaichao
|
c391e4b68e
|
[Core] improve robustness of pynccl (#3860)
|
2024-04-04 16:52:12 -07:00 |
|
Saurabh Dash
|
9117f892f0
|
[Model] Cohere CommandR+ (#3829)
|
2024-04-04 13:31:49 -07:00 |
|
Michael Goin
|
db2a6a41e2
|
[Hardware][CPU] Update cpu torch to match default of 2.2.1 (#3854)
|
2024-04-04 19:49:49 +00:00 |
|