youkaichao
c411def234
[torch.compile] fix shape specialization ( #10722 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-27 10:16:10 -08:00
Chendi.Xue
0a71900bc9
Remove hard-dependencies of Speculative decode to CUDA workers ( #10587 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2024-11-26 17:57:11 -08:00
Murali Andoorveedu
db66e018ea
[Bugfix] Fix for Spec model TP + Chunked Prefill ( #10232 )
...
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Co-authored-by: Sourashis Roy <sroy@roblox.com>
2024-11-26 09:11:16 -08:00
Wallas Henrique
c27df94e1f
[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices ( #9850 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-11-25 12:23:32 -05:00
Cyrus Leung
ed46f14321
[Model] Support is_causal HF config field for Qwen2 model ( #10621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 09:51:20 +00:00
youkaichao
05d1f8c9c6
[misc] move functions to config.py ( #10624 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-25 09:27:30 +00:00
youkaichao
25d806e953
[misc] add torch.compile compatibility check ( #10618 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-24 23:40:08 -08:00
Mengqing Cao
7ea3cd7c3e
[Refactor][MISC] del redundant code in ParallelConfig.postinit ( #10614 )
...
Signed-off-by: MengqingCao <cmq0113@163.com>
2024-11-25 05:14:56 +00:00
Maximilien de Bayser
214efc2c3c
Support Cross encoder models ( #10400 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
2024-11-24 18:56:20 -08:00
kliuae
7c25fe45a6
[AMD] Add support for GGUF quantization on ROCm ( #10254 )
2024-11-22 21:14:49 -08:00
Michael Goin
02a43f82a9
Update default max_num_batch_tokens for chunked prefill to 2048 ( #10544 )
2024-11-22 21:14:19 -08:00
youkaichao
4aba6e3d1a
[core] gemma2 full context length support ( #10584 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 20:13:54 -08:00
youkaichao
eebad39f26
[torch.compile] support all attention backends ( #10558 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 14:04:42 -08:00
youkaichao
a111d0151f
[platforms] absorb worker cls difference into platforms folder ( #10555 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2024-11-21 21:00:32 -08:00
Woosuk Kwon
f9310cbd0c
[V1] Fix Compilation config & Enable CUDA graph by default ( #10528 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-21 12:53:39 -08:00
youkaichao
aaddce5d26
[platforms] improve error message for unspecified platforms ( #10520 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-20 23:07:56 -08:00
Luka Govedič
8b0fe06c89
[torch.compile] Inductor code caching fix ( #10273 )
...
Signed-off-by: luka <luka@neuralmagic.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
2024-11-20 21:44:57 -08:00
Mengqing Cao
9d827170a3
[Platforms] Add device_type in Platform ( #10508 )
...
Signed-off-by: MengqingCao <cmq0113@163.com>
2024-11-21 04:44:20 +00:00
youkaichao
388ee3de66
[torch.compile] limit inductor threads and lazy import quant ( #10482 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-20 18:36:33 -08:00
youkaichao
0cd3d9717e
[7/N] torch.compile, reduce compilation time ( #10460 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-20 11:20:38 -08:00
youkaichao
803f37eaaa
[6/N] torch.compile rollout to users ( #10437 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-19 10:09:03 -08:00
youkaichao
a03ea40792
[3/N][torch.compile] consolidate custom op logging ( #10399 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-18 15:14:59 -08:00
youkaichao
51bb12d17b
[4/N][torch.compile] clean up set_torch_compile_backend ( #10401 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-17 23:57:20 -08:00
youkaichao
4fd9375028
[2/N][torch.compile] make compilation cfg part of vllm cfg ( #10383 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-16 18:02:14 -08:00
Cyrus Leung
32e46e000f
[Frontend] Automatic detection of chat content format from AST ( #9919 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-16 13:35:40 +08:00
Cyrus Leung
2ac6d0e75b
[Misc] Consolidate pooler config overrides ( #10351 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-15 06:59:00 +00:00
Cyrus Leung
972112d82f
[Bugfix] Fix unable to load some models ( #10312 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-14 16:55:54 -08:00
Cyrus Leung
0b8bb86bf1
[1/N] Initial prototype for multi-modal processor ( #10044 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-13 12:39:03 +00:00
Umesh
8a06428c70
[LoRA] Adds support for bias in LoRA ( #5733 )
...
Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com>
Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>
2024-11-12 11:08:40 -08:00
youkaichao
eea55cca5b
[1/N] torch.compile user interface design ( #10237 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 18:01:06 -08:00
Russell Bryant
9cdba9669c
[Doc] Update help text for --distributed-executor-backend ( #10231 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-12 09:55:09 +08:00
Robert Shaw
6ace6fba2c
[V1] AsyncLLM Implementation ( #9826 )
...
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-11-11 23:05:38 +00:00
Cyrus Leung
51c2e1fcef
[CI/Build] Split up models tests ( #10069 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:39:14 -08:00
Krishna Mandal
b09895a618
[Frontend][Core] Override HF config.json via CLI ( #5836 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 16:19:27 +00:00
sroy745
f6778620a9
Disable spec-decode + chunked-prefill for draft models with tensor parallelism > 1 ( #10136 )
...
Signed-off-by: Sourashis Roy <sroy@roblox.com>
2024-11-08 15:56:18 +00:00
Nicolò Lucchesi
9d43afcc53
[Feature] [Spec decode]: Combine chunked prefill with speculative decoding ( #9291 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2024-11-07 08:15:14 -08:00
Flávia Béo
aa9078fa03
Adds method to read the pooling types from model's files ( #9506 )
...
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2024-11-07 08:42:40 +00:00
Cyrus Leung
db7db4aab9
[Misc] Consolidate ModelConfig code related to HF config ( #10104 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-07 06:00:21 +00:00
Konrad Zawora
a02a50e6e5
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend ( #6143 )
...
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
2024-11-06 01:09:10 -08:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL ( #8464 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
youkaichao
8d72bb20fa
[4/N] make quant config first-class citizen ( #9978 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-04 08:51:31 -08:00
Chauncey
ac6b8f19b9
[Frontend] Multi-Modality Support for Loading Local Image Files ( #9915 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2024-11-04 15:34:57 +00:00
youkaichao
e893795443
[2/N] executor pass the complete config to worker/modelrunner ( #9938 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2024-11-02 07:35:05 -07:00
Roger Wang
3ea2dc2ec4
[Misc] Remove deprecated arg for cuda graph capture ( #9864 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-10-31 07:22:07 +00:00
Went-Liang
81f09cfd80
[Model] Support math-shepherd-mistral-7b-prm model ( #9697 )
...
Signed-off-by: Went-Liang <wenteng_liang@163.com>
2024-10-30 09:33:42 -07:00
科英
74fc2d77ae
[Misc] Add metrics for request queue time, forward time, and execute time ( #9659 )
2024-10-29 10:32:56 -07:00
wangshuai09
4e2d95e372
[Hardware][ROCM] using current_platform.is_rocm ( #9642 )
...
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-28 04:07:00 +00:00
Mengqing Cao
5cbdccd151
[Hardware][openvino] is_openvino --> current_platform.is_openvino ( #9716 )
2024-10-26 10:59:06 +00:00
Vinay R Damodaran
33bab41060
[Bugfix]: Make chat content text allow type content ( #9358 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
2024-10-24 05:05:49 +00:00
Mengqing Cao
2394962d70
[Hardware][XPU] using current_platform.is_xpu ( #9605 )
2024-10-23 08:28:21 +00:00