zixuanzhang226
|
d746268e92
|
[Model] support bitsandbytes quantization with minicpm model (#10842)
Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>
|
2024-12-03 03:06:41 +00:00 |
|
Michael Goin
|
4433195ab7
|
[Bugfix] Prevent benchmark_throughput.py from using duplicated random prompts (#10753)
|
2024-12-03 02:26:15 +00:00 |
|
Isotr0py
|
4c05edb33a
|
[Model] Add TP and BNB quantization support to LlavaMultiModalProjector (#10834)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-02 23:06:09 +00:00 |
|
Jani Monoses
|
9b14d978aa
|
Fix openvino on GPU (#10793)
|
2024-12-02 18:52:19 +00:00 |
|
Yan Ma
|
519cc6ca12
|
[Misc][XPU] Avoid torch compile for XPU platform (#10747)
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 17:53:55 +00:00 |
|
Jee Jee Li
|
b45f0d7946
|
[Misc][LoRA] Move the implementation of lora bias to punica.py (#10829)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-02 17:53:36 +00:00 |
|
youkaichao
|
a4c4daf364
|
[misc] use out argument for flash attention (#10822)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 10:50:10 +00:00 |
|
Cyrus Leung
|
e95f275f57
|
[CI/Build] Update mistral_common version for tests and docs (#10825)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-02 10:26:10 +00:00 |
|
zhou fan
|
ef31eabc68
|
[Model]: add some tests for aria model (#10770)
Signed-off-by: xffxff <1247714429@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-02 05:36:36 +00:00 |
|
wangxiyuan
|
995a148575
|
[doc]Update config docstring (#10732)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-02 04:14:45 +00:00 |
|
youkaichao
|
63a164172d
|
[misc] remove xverse modeling file (#10814)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 03:27:13 +00:00 |
|
Maximilien de Bayser
|
e25810ae29
|
Fill TorchSDPAAttentionMetadata seq_lens_field for prefill (#10799)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-12-02 10:05:32 +08:00 |
|
Woosuk Kwon
|
073a4bd1c0
|
[Kernel] Use out arg in flash_attn_varlen_func (#10811)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-01 17:55:39 -08:00 |
|
cduk
|
b7954776fd
|
[core] Avoid metrics log noise when idle - include speculative decodi… (#10809)
|
2024-12-02 01:49:48 +00:00 |
|
Isotr0py
|
b18c9bbaba
|
[Model] Add BNB support to Llava and Pixtral-HF (#10795)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-02 01:31:09 +00:00 |
|
Kuntai Du
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|
Roger Wang
|
c11f172187
|
[Misc] Adding MMMU-Pro vision dataset to serving benchmark (#10804)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-01 08:47:05 +00:00 |
|
youkaichao
|
169a0ff911
|
[doc] add warning about comparing hf and vllm outputs (#10805)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-01 00:41:38 -08:00 |
|
Cyrus Leung
|
d2f058e76c
|
[Misc] Rename embedding classes to pooling (#10801)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 14:36:51 +08:00 |
|
Cyrus Leung
|
f877a7d12a
|
[Misc] Improve type annotations for support_torch_compile (#10763)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-30 17:48:35 -08:00 |
|
Cyrus Leung
|
133707123e
|
[Model] Replace embedding models with pooling adapter (#10769)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 08:02:54 +08:00 |
|
wangxiyuan
|
7e4bbda573
|
[doc] format fix (#10789)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-11-30 11:38:40 +00:00 |
|
Patrick von Platen
|
e7cfc4ef4c
|
[Interleaved ATTN] Support for Mistral-8B (#10591)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-30 07:45:50 +00:00 |
|
Isotr0py
|
16ee07f22a
|
[Model] Refactor Molmo weights loading to use AutoWeightsLoader (#10771)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-30 04:19:14 +00:00 |
|
Nicolò Lucchesi
|
40bc242579
|
[Bugfix] Fix OpenVino/Neuron driver_worker init (#10779)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-11-30 12:07:13 +08:00 |
|
wangxiyuan
|
661175bc82
|
[platform] Add verify_quantization in platform. (#10757)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-11-29 15:22:21 +00:00 |
|
Jee Jee Li
|
3132aac043
|
[Bugfix] Fix Idefics3 bug (#10778)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-29 13:56:46 +00:00 |
|
wang.yuqi
|
c82b432d4a
|
[Misc] typo find in sampling_metadata.py (#10740)
|
2024-11-29 05:17:57 +00:00 |
|
Cyrus Leung
|
fa6ecb9aa7
|
[Model] Clean up MiniCPMV (#10751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-29 04:47:06 +00:00 |
|
Isotr0py
|
c83919c7a6
|
[Model] Add Internlm2 LoRA support (#5064)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-28 17:29:04 +00:00 |
|
Woosuk Kwon
|
98f47f2a40
|
[V1] Optimize the CPU overheads in FlashAttention custom op (#10733)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-28 09:01:02 -08:00 |
|
Woosuk Kwon
|
8c1e77fb58
|
[Kernel] Update vllm-flash-attn version to reduce CPU overheads (#10742)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-28 08:31:28 -08:00 |
|
sixgod
|
5fc5ce0fe4
|
[Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-11-28 14:53:31 +00:00 |
|
Richard Liu
|
3ed5e73146
|
[TPU] Update requirements-tpu (#10726)
Signed-off-by: Richard Liu <ricliu@google.com>
|
2024-11-28 02:30:48 -08:00 |
|
Woosuk Kwon
|
9a8bff0285
|
[Kernel] Update vllm-flash-attn version (#10736)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-28 02:25:59 -08:00 |
|
Woosuk Kwon
|
a79b122400
|
[V1] Do not allocate beyond the max_model_len (#10730)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-28 00:13:15 -08:00 |
|
Ricky Xu
|
d9b4b3f069
|
[Bug][CLI] Allow users to disable prefix caching explicitly (#10724)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-27 23:59:28 -08:00 |
|
罗泽轩
|
278be671a3
|
[Doc] Update model in arch_overview.rst to match comment (#10701)
Signed-off-by: spacewander <spacewanderlzx@gmail.com>
|
2024-11-27 23:58:39 -08:00 |
|
zixuanzhang226
|
70dc14fbd0
|
[Model] support bitsandbytes quantization with minicpm3 model (#10682)
Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>
|
2024-11-27 23:58:02 -08:00 |
|
youkaichao
|
cb4e1c3f3a
|
[misc] upgrade filelock version (#10731)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-27 19:54:58 -08:00 |
|
tomeras91
|
395b1c7454
|
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server (#10635)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-11-27 13:21:10 -08:00 |
|
Cyrus Leung
|
9b4b150395
|
[Bugfix] Ignore lm_head when loading embedding models (#10719)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-27 19:05:29 +00:00 |
|
Mor Zusman
|
197b4484a3
|
[Bugfix][Mamba] Fix Multistep on Mamba-like models (#10705)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
|
2024-11-27 19:02:27 +00:00 |
|
Isotr0py
|
b98c62ba49
|
[Bugfix] Fix GGUF inference with FP16 unquantized checkpoint (#10675)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-27 10:43:17 -08:00 |
|
youkaichao
|
c411def234
|
[torch.compile] fix shape specialization (#10722)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-27 10:16:10 -08:00 |
|
youkaichao
|
308cc5e21e
|
[ci] fix slow tests (#10698)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-27 09:26:14 -08:00 |
|
Roger Wang
|
9e0a147d50
|
[V1] Update interface for mistral-format Pixtral (#10703)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-27 12:26:27 +00:00 |
|
Li, Jiang
|
418cb3b93f
|
[Bugfix][Hardware][CPU] Fix intel-omp version to avoid segfault (#10700)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-11-27 11:55:38 +00:00 |
|
shunxing12345
|
1209261e93
|
[Model] Support telechat2 (#10311)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-11-27 11:32:35 +00:00 |
|
Tyler Michael Smith
|
e2251109c7
|
[Kernel] Remove if-else with identical branches in marlin 2:4 (#10687)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-26 22:55:32 -08:00 |
|