Cyrus Leung
|
7025b11d94
|
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410)
|
2024-08-13 05:33:41 +00:00 |
|
Isotr0py
|
360bd67cf0
|
[Core] Support loading GGUF model (#5191)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-05 17:54:23 -06:00 |
|
xuyi
|
1d2e7fb73f
|
[Model] Pipeline parallel support for Qwen2 (#6924)
|
2024-07-31 18:49:51 -07:00 |
|
Alphi
|
2f4e108f75
|
[Bugfix] Clean up MiniCPM-V (#6939)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-31 14:39:19 +00:00 |
|
Isotr0py
|
7cbd9ec7a9
|
[Model] Initialize support for InternVL2 series models (#6514)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-29 10:16:30 +00:00 |
|
Michael Goin
|
978aed5300
|
[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081)
|
2024-07-16 15:31:32 -07:00 |
|
Qubitium-ModelCloud
|
ee93f4f92a
|
[CORE] Quantized lm-head Framework (#4442)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-07-02 22:25:17 +00:00 |
|
Murali Andoorveedu
|
c5832d2ae9
|
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 10:58:08 -07:00 |
|
Cyrus Leung
|
98cf2ed678
|
[Model][Bugfix] Implicit model flags and reenable Phi-3-Vision (#5896)
|
2024-06-27 09:08:10 -07:00 |
|
Cyrus Leung
|
96354d6a29
|
[Model] Add base class for LoRA-supported models (#5018)
|
2024-06-27 16:03:04 +08:00 |
|
Michael Goin
|
da971ec7a5
|
[Model] Add FP8 kv cache for Qwen2 (#5656)
|
2024-06-19 09:38:26 +00:00 |
|
Zhuohan Li
|
1102bef219
|
[Bugfix / Core] Prefix Caching Guards (merged with main) (#4846)
Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-05-27 15:18:17 -07:00 |
|
Cody Yu
|
a3a73ab069
|
[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893)
The 2nd PR for #4532.
This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
|
2024-05-22 13:28:20 -07:00 |
|
HUANG Fei
|
d130b573a0
|
[Model] add rope_scaling support for qwen2 (#4930)
|
2024-05-21 05:22:22 +00:00 |
|
Woosuk Kwon
|
0fca3cdcf2
|
[Misc] Enhance attention selector (#4751)
|
2024-05-13 10:47:25 -07:00 |
|
Cody Yu
|
a62aaf1df5
|
[Misc][Refactor] Generalize linear_method to be quant_method (#4373)
|
2024-04-26 16:41:14 -04:00 |
|
Antoni Baum
|
69e1d2fb69
|
[Core] Refactor model loading code (#4097)
|
2024-04-16 11:34:39 -07:00 |
|
youkaichao
|
63e7176f26
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
Woosuk Kwon
|
925f3332ca
|
[Core] Refactor Attention Take 2 (#3462)
|
2024-03-25 04:39:33 +00:00 |
|
Roy
|
ea5f14e6ff
|
[Bugfix][Model] Fix Qwen2 (#3554)
|
2024-03-22 00:18:58 +00:00 |
|
Roy
|
f1c0fc3919
|
Migrate logits computation and gather to model_runner (#3233)
|
2024-03-20 23:25:01 +00:00 |
|
Yang Fan
|
a7c871680e
|
Fix tie_word_embeddings for Qwen2. (#3344)
|
2024-03-15 09:36:53 -07:00 |
|
Zhuohan Li
|
2f8844ba08
|
Re-enable the 80 char line width limit (#3305)
|
2024-03-10 19:49:14 -07:00 |
|
whyiug
|
c59e120c55
|
Feature add lora support for Qwen2 (#3177)
|
2024-03-07 21:58:24 -08:00 |
|
Woosuk Kwon
|
2daf23ab0c
|
Separate attention backends (#3005)
|
2024-03-07 01:45:50 -08:00 |
|
Junyang Lin
|
2832e7b9f9
|
fix names and license for Qwen2 (#2589)
|
2024-01-24 22:37:51 -08:00 |
|
Junyang Lin
|
94b5edeb53
|
Add qwen2 (#2495)
|
2024-01-22 14:34:21 -08:00 |
|