youkaichao
803f37eaaa
[6/N] torch.compile rollout to users ( #10437 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-19 10:09:03 -08:00
youkaichao
a03ea40792
[3/N][torch.compile] consolidate custom op logging ( #10399 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-18 15:14:59 -08:00
youkaichao
51bb12d17b
[4/N][torch.compile] clean up set_torch_compile_backend ( #10401 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-17 23:57:20 -08:00
youkaichao
4fd9375028
[2/N][torch.compile] make compilation cfg part of vllm cfg ( #10383 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-16 18:02:14 -08:00
Cyrus Leung
32e46e000f
[Frontend] Automatic detection of chat content format from AST ( #9919 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-16 13:35:40 +08:00
Cyrus Leung
2ac6d0e75b
[Misc] Consolidate pooler config overrides ( #10351 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-15 06:59:00 +00:00
Cyrus Leung
972112d82f
[Bugfix] Fix unable to load some models ( #10312 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-14 16:55:54 -08:00
Cyrus Leung
0b8bb86bf1
[1/N] Initial prototype for multi-modal processor ( #10044 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-13 12:39:03 +00:00
Umesh
8a06428c70
[LoRA] Adds support for bias in LoRA ( #5733 )
...
Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com>
Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>
2024-11-12 11:08:40 -08:00
youkaichao
eea55cca5b
[1/N] torch.compile user interface design ( #10237 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 18:01:06 -08:00
Russell Bryant
9cdba9669c
[Doc] Update help text for --distributed-executor-backend ( #10231 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-12 09:55:09 +08:00
Robert Shaw
6ace6fba2c
[V1] AsyncLLM Implementation ( #9826 )
...
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-11-11 23:05:38 +00:00
Cyrus Leung
51c2e1fcef
[CI/Build] Split up models tests ( #10069 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:39:14 -08:00
Krishna Mandal
b09895a618
[Frontend][Core] Override HF config.json via CLI ( #5836 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 16:19:27 +00:00
sroy745
f6778620a9
Disable spec-decode + chunked-prefill for draft models with tensor parallelism > 1 ( #10136 )
...
Signed-off-by: Sourashis Roy <sroy@roblox.com>
2024-11-08 15:56:18 +00:00
Nicolò Lucchesi
9d43afcc53
[Feature] [Spec decode]: Combine chunked prefill with speculative decoding ( #9291 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2024-11-07 08:15:14 -08:00
Flávia Béo
aa9078fa03
Adds method to read the pooling types from model's files ( #9506 )
...
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2024-11-07 08:42:40 +00:00
Cyrus Leung
db7db4aab9
[Misc] Consolidate ModelConfig code related to HF config ( #10104 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-07 06:00:21 +00:00
Konrad Zawora
a02a50e6e5
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend ( #6143 )
...
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
2024-11-06 01:09:10 -08:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL ( #8464 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
youkaichao
8d72bb20fa
[4/N] make quant config first-class citizen ( #9978 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-04 08:51:31 -08:00
Chauncey
ac6b8f19b9
[Frontend] Multi-Modality Support for Loading Local Image Files ( #9915 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2024-11-04 15:34:57 +00:00
youkaichao
e893795443
[2/N] executor pass the complete config to worker/modelrunner ( #9938 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2024-11-02 07:35:05 -07:00
Roger Wang
3ea2dc2ec4
[Misc] Remove deprecated arg for cuda graph capture ( #9864 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-10-31 07:22:07 +00:00
Went-Liang
81f09cfd80
[Model] Support math-shepherd-mistral-7b-prm model ( #9697 )
...
Signed-off-by: Went-Liang <wenteng_liang@163.com>
2024-10-30 09:33:42 -07:00
科英
74fc2d77ae
[Misc] Add metrics for request queue time, forward time, and execute time ( #9659 )
2024-10-29 10:32:56 -07:00
wangshuai09
4e2d95e372
[Hardware][ROCM] using current_platform.is_rocm ( #9642 )
...
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-28 04:07:00 +00:00
Mengqing Cao
5cbdccd151
[Hardware][openvino] is_openvino --> current_platform.is_openvino ( #9716 )
2024-10-26 10:59:06 +00:00
Vinay R Damodaran
33bab41060
[Bugfix]: Make chat content text allow type content ( #9358 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
2024-10-24 05:05:49 +00:00
Mengqing Cao
2394962d70
[Hardware][XPU] using current_platform.is_xpu ( #9605 )
2024-10-23 08:28:21 +00:00
xendo
9dbcce84a7
[Neuron] [Bugfix] Fix neuron startup ( #9374 )
...
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
2024-10-22 12:51:41 +00:00
Nick Hill
15713e3b75
[BugFix] Update draft model TP size check to allow matching target TP size ( #9394 )
...
Co-authored-by: Baoyuan Qi <qibaoyuan@126.com>
2024-10-21 14:14:29 -07:00
Cyrus Leung
263d8ee150
[Bugfix] Fix missing task for speculative decoding ( #9524 )
2024-10-19 06:49:40 +00:00
Cyrus Leung
051eaf6db3
[Model] Add user-configurable task for models that support both generation and embedding ( #9424 )
2024-10-18 11:31:58 -07:00
Kuntai Du
81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default ( #8704 )
...
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
Russell Bryant
776dbd74f1
[CI/Build] mypy: Resolve some errors from checking vllm/engine ( #9267 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-16 22:55:59 +00:00
Patrick von Platen
415f76a9cb
Support mistral interleaved attn ( #9414 )
2024-10-16 13:28:30 +00:00
Cyrus Leung
7abba39ee6
[Model] VLM2Vec, the first multimodal embedding model in vLLM ( #9303 )
2024-10-16 14:31:00 +08:00
Cyrus Leung
7e7eae338d
[Misc] Standardize RoPE handling for Qwen2-VL ( #9250 )
2024-10-16 13:56:17 +08:00
Kunshang Ji
4141608c6a
[Hardware][intel GPU] add async output process for xpu ( #8897 )
2024-10-14 12:23:33 -06:00
Wallas Henrique
8baf85e4e9
[Doc] Compatibility matrix for mutual exclusive features ( #8512 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-11 11:18:50 -07:00
Tyler Michael Smith
7342a7d7f8
[Model] Support Mamba ( #6484 )
2024-10-11 15:40:06 +00:00
sroy745
f3a507f1d3
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 ( #9149 )
2024-10-10 14:17:17 +08:00
Murali Andoorveedu
0f6d7a9a34
[Models] Add remaining model PP support ( #7168 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
sroy745
f5d72b2fc6
[Core] Make BlockSpaceManagerV2 the default BlockManager to use. ( #8678 )
2024-10-03 09:44:21 -07:00
Lily Liu
1570203864
[Spec Decode] (1/2) Remove batch expansion ( #8839 )
2024-10-01 16:04:42 -07:00
Varun Sundar Rabindranath
c2ec430ab5
[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path ( #8378 )
...
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-09-27 13:32:07 -07:00
Chen Zhang
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model ( #8811 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Archit Patke
6da1ab6b41
[Core] Adding Priority Scheduling ( #5958 )
2024-09-24 19:50:50 -07:00
Jee Jee Li
13f9f7a3d0
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 ( #8768 )
2024-09-24 17:08:55 -07:00