Cyrus Leung
ed46f14321
[Model] Support is_causal HF config field for Qwen2 model ( #10621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 09:51:20 +00:00
youkaichao
05d1f8c9c6
[misc] move functions to config.py ( #10624 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-25 09:27:30 +00:00
youkaichao
25d806e953
[misc] add torch.compile compatibility check ( #10618 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-24 23:40:08 -08:00
youkaichao
65813781a2
[torch.compile] add warning for unsupported models ( #10622 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-24 23:27:51 -08:00
Jee Jee Li
7c2134beda
[torch.compile] force inductor threads ( #10620 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-24 23:04:21 -08:00
Cyrus Leung
a30a605d21
[Doc] Add encoder-based models to Supported Models page ( #10616 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 06:34:07 +00:00
youkaichao
571841b7fc
[torch.compile] support encoder based models ( #10613 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-25 05:24:33 +00:00
Mengqing Cao
7ea3cd7c3e
[Refactor][MISC] del redundant code in ParallelConfig.postinit ( #10614 )
...
Signed-off-by: MengqingCao <cmq0113@163.com>
2024-11-25 05:14:56 +00:00
Maximilien de Bayser
214efc2c3c
Support Cross encoder models ( #10400 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
2024-11-24 18:56:20 -08:00
Zhuohan Li
49628fe13e
[Doc] Update README.md with Ray Summit talk links ( #10610 )
2024-11-24 16:45:09 -08:00
youkaichao
e4fbb14414
[doc] update the code to add models ( #10603 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-11-24 11:21:40 -08:00
youkaichao
c055747867
[model][utils] add extract_layer_index utility function ( #10599 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-23 22:22:54 -08:00
youkaichao
eda2b3589c
Revert "Print running script to enhance CI log readability" ( #10601 )
2024-11-23 21:31:47 -08:00
Jee Jee Li
1c445dca51
[CI/Build] Print running script to enhance CI log readability ( #10594 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-24 03:57:13 +00:00
Jee Jee Li
1700c543a5
[Bugfix] Fix LoRA weight sharding ( #10450 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-11-23 17:23:17 -08:00
Jee Jee Li
17d8fc1806
[bugfix] Fix example/tensorize_vllm_model tests ( #10595 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-23 17:22:33 -08:00
Isotr0py
04668ebe7a
[Bugfix] Avoid import AttentionMetadata explicitly in Mllama ( #10593 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-23 18:12:20 +00:00
Nishidha
651f6c31ac
For ppc64le, disabled tests for now and addressed space issues ( #10538 )
2024-11-23 09:33:53 +00:00
JiHuazhong
86a44fb896
[Platforms] Refactor openvino code ( #10573 )
...
Signed-off-by: statelesshz <hzji210@gmail.com>
2024-11-22 22:23:12 -08:00
Isotr0py
4cfe5d2bca
[Bugfix] multi_modal_kwargs broadcast for CPU tensor parallel ( #10541 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-22 21:25:46 -08:00
Cyrus Leung
c8acd80548
[2/N] handling placeholders in merged multi-modal processor ( #10485 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-22 21:25:09 -08:00
Ricky Xu
4634a89d18
Prefix Cache Aware Scheduling [1/n] ( #10128 )
...
Signed-off-by: rickyx <rickyx@anyscale.com>
2024-11-22 21:15:55 -08:00
kliuae
7c25fe45a6
[AMD] Add support for GGUF quantization on ROCm ( #10254 )
2024-11-22 21:14:49 -08:00
Michael Goin
02a43f82a9
Update default max_num_batch_tokens for chunked prefill to 2048 ( #10544 )
2024-11-22 21:14:19 -08:00
Chen Wu
cfea9c04ef
[Model] Fix Baichuan BNB online quantization ( #10572 )
...
Signed-off-by: Chen Wu <cntryroa@gmail.com>
2024-11-22 21:13:59 -08:00
Varun Vinayak Shenoy
7d8ffb344f
[Bugfix] Internal Server Error when tool_choice is incorrect. ( #10567 )
...
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
2024-11-22 21:13:29 -08:00
youkaichao
4aba6e3d1a
[core] gemma2 full context length support ( #10584 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 20:13:54 -08:00
Tyler Michael Smith
978b39744b
[Misc] Add pynccl wrappers for all_gather and reduce_scatter ( #9432 )
2024-11-22 22:14:03 -05:00
Russell Bryant
ebda51968b
[Core] Fix broken log configuration ( #10458 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-23 10:23:51 +08:00
Travis Johnson
9195dbdbca
[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use ( #10164 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-11-23 10:17:38 +08:00
youkaichao
d559979c54
[bugfix] fix cpu tests ( #10585 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 17:34:03 -08:00
Zhonghua Deng
d345f409b7
[V1] EngineCore supports profiling ( #10564 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
2024-11-22 17:16:15 -08:00
Russell Bryant
28598f3939
[Core] remove temporary local variables in LLMEngine.__init__ ( #10577 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-22 16:22:53 -08:00
zixuanzhang226
948c859571
support bitsandbytes quantization with qwen model ( #10549 )
...
Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>
2024-11-22 16:16:14 -08:00
Ricky Xu
97814fbf0f
[v1] Refactor KVCacheManager for more hash input than token ids ( #10507 )
...
Signed-off-by: rickyx <rickyx@anyscale.com>
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-11-22 23:27:25 +00:00
youkaichao
eebad39f26
[torch.compile] support all attention backends ( #10558 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 14:04:42 -08:00
youkaichao
db100c5cde
[bugfix] fix full graph tests ( #10581 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 10:02:14 -08:00
Noam Gat
11fcf0e066
Remove token-adding chat embedding params ( #10551 )
...
Signed-off-by: Noam Gat <noamgat@gmail.com>
2024-11-21 23:59:47 -08:00
Isotr0py
b6374e09b0
[Bugfix] Fix Phi-3 BNB quantization with tensor parallel ( #9948 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-22 15:01:56 +08:00
youkaichao
a111d0151f
[platforms] absorb worker cls difference into platforms folder ( #10555 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2024-11-21 21:00:32 -08:00
Woosuk Kwon
446c7806b2
[Minor] Fix line-too-long ( #10563 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-21 19:40:40 -08:00
youkaichao
33e0a2540a
[9/N] torch.compile LLM usage ( #10552 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-21 19:13:31 -08:00
Simon Mo
aed074860a
[Benchmark] Add new H100 machine ( #10547 )
2024-11-21 18:27:20 -08:00
Michael Goin
9afa014552
Add small example to metrics.rst ( #10550 )
2024-11-21 23:43:43 +00:00
Woosuk Kwon
46fe9b46d8
[Minor] Revert change in offline inference example ( #10545 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-21 21:28:16 +00:00
youkaichao
cf656f5a02
[misc] improve error message ( #10553 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-21 13:13:17 -08:00
Yunmeng
edec3385b6
[CI][Installation] Avoid uploading CUDA 11.8 wheel ( #10535 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-11-21 13:03:58 -08:00
Woosuk Kwon
f9310cbd0c
[V1] Fix Compilation config & Enable CUDA graph by default ( #10528 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-21 12:53:39 -08:00
youkaichao
7560ae5caf
[8/N] enable cli flag without a space ( #10529 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-21 12:30:42 -08:00
Cyrus Leung
e7a8341c7c
[Bugfix] Allow token ID-only inputs in Qwen2-Audio ( #10536 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-21 18:09:43 +00:00