youkaichao
334d64d1e8
[ci] add vllm_test_utils ( #10659 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-26 00:20:04 -08:00
Cyrus Leung
940635343a
[Misc] Remove outdated init protocols ( #10655 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-26 14:55:00 +08:00
Sage Moore
9a88f89799
custom allreduce + torch.compile ( #10121 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-25 22:00:16 -08:00
Ricky Xu
519e8e4182
[v1] EngineArgs for better config handling for v1 ( #10382 )
...
Signed-off-by: rickyx <rickyx@anyscale.com>
2024-11-25 21:09:43 -08:00
Sanket Kale
a6760f6456
[Feature] vLLM ARM Enablement for AARCH64 CPUs ( #9228 )
...
Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-11-25 18:32:39 -08:00
youkaichao
45ac4ff270
[bugfix] fix aria model and add torch.compile ( #10645 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-25 18:32:09 -08:00
youkaichao
6e9ff050c8
[misc] do not read HOST_IP ( #10644 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-25 17:04:50 -08:00
Shane A
9db713a1dc
[Model] Add OLMo November 2024 model ( #10503 )
2024-11-25 17:26:40 -05:00
Cyrus Leung
1b583cfefa
[Doc] Fix typos in docs ( #10636 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 10:15:45 -08:00
Cyrus Leung
cf73f0c95e
[Model] Enable optional prefix when loading embedding models ( #10639 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 18:14:33 +00:00
zhou fan
b1d920531f
[Model]: Add support for Aria model ( #10514 )
...
Signed-off-by: xffxff <1247714429@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-11-25 18:10:55 +00:00
Simon Mo
452a4e80c3
[Docs] Add Snowflake Slides ( #10641 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-11-25 09:34:46 -08:00
Wallas Henrique
c27df94e1f
[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices ( #9850 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-11-25 12:23:32 -05:00
Chauncey
d04b13a380
[Bug]: Authorization ignored when root_path is set ( #10606 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2024-11-25 16:21:41 +00:00
fzyzcjy
2b0879bfc2
Super tiny little typo fix ( #10633 )
2024-11-25 13:08:30 +00:00
Cyrus Leung
ed46f14321
[Model] Support is_causal HF config field for Qwen2 model ( #10621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 09:51:20 +00:00
youkaichao
05d1f8c9c6
[misc] move functions to config.py ( #10624 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-25 09:27:30 +00:00
youkaichao
25d806e953
[misc] add torch.compile compatibility check ( #10618 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-24 23:40:08 -08:00
youkaichao
65813781a2
[torch.compile] add warning for unsupported models ( #10622 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-24 23:27:51 -08:00
Jee Jee Li
7c2134beda
[torch.compile] force inductor threads ( #10620 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-24 23:04:21 -08:00
Cyrus Leung
a30a605d21
[Doc] Add encoder-based models to Supported Models page ( #10616 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 06:34:07 +00:00
youkaichao
571841b7fc
[torch.compile] support encoder based models ( #10613 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-25 05:24:33 +00:00
Mengqing Cao
7ea3cd7c3e
[Refactor][MISC] del redundant code in ParallelConfig.postinit ( #10614 )
...
Signed-off-by: MengqingCao <cmq0113@163.com>
2024-11-25 05:14:56 +00:00
Maximilien de Bayser
214efc2c3c
Support Cross encoder models ( #10400 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
2024-11-24 18:56:20 -08:00
Zhuohan Li
49628fe13e
[Doc] Update README.md with Ray Summit talk links ( #10610 )
2024-11-24 16:45:09 -08:00
youkaichao
e4fbb14414
[doc] update the code to add models ( #10603 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-11-24 11:21:40 -08:00
youkaichao
c055747867
[model][utils] add extract_layer_index utility function ( #10599 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-23 22:22:54 -08:00
youkaichao
eda2b3589c
Revert "Print running script to enhance CI log readability" ( #10601 )
2024-11-23 21:31:47 -08:00
Jee Jee Li
1c445dca51
[CI/Build] Print running script to enhance CI log readability ( #10594 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-24 03:57:13 +00:00
Jee Jee Li
1700c543a5
[Bugfix] Fix LoRA weight sharding ( #10450 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-11-23 17:23:17 -08:00
Jee Jee Li
17d8fc1806
[bugfix] Fix example/tensorize_vllm_model tests ( #10595 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-23 17:22:33 -08:00
Isotr0py
04668ebe7a
[Bugfix] Avoid import AttentionMetadata explicitly in Mllama ( #10593 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-23 18:12:20 +00:00
Nishidha
651f6c31ac
For ppc64le, disabled tests for now and addressed space issues ( #10538 )
2024-11-23 09:33:53 +00:00
JiHuazhong
86a44fb896
[Platforms] Refactor openvino code ( #10573 )
...
Signed-off-by: statelesshz <hzji210@gmail.com>
2024-11-22 22:23:12 -08:00
Isotr0py
4cfe5d2bca
[Bugfix] multi_modal_kwargs broadcast for CPU tensor parallel ( #10541 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-22 21:25:46 -08:00
Cyrus Leung
c8acd80548
[2/N] handling placeholders in merged multi-modal processor ( #10485 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-22 21:25:09 -08:00
Ricky Xu
4634a89d18
Prefix Cache Aware Scheduling [1/n] ( #10128 )
...
Signed-off-by: rickyx <rickyx@anyscale.com>
2024-11-22 21:15:55 -08:00
kliuae
7c25fe45a6
[AMD] Add support for GGUF quantization on ROCm ( #10254 )
2024-11-22 21:14:49 -08:00
Michael Goin
02a43f82a9
Update default max_num_batch_tokens for chunked prefill to 2048 ( #10544 )
2024-11-22 21:14:19 -08:00
Chen Wu
cfea9c04ef
[Model] Fix Baichuan BNB online quantization ( #10572 )
...
Signed-off-by: Chen Wu <cntryroa@gmail.com>
2024-11-22 21:13:59 -08:00
Varun Vinayak Shenoy
7d8ffb344f
[Bugfix] Internal Server Error when tool_choice is incorrect. ( #10567 )
...
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
2024-11-22 21:13:29 -08:00
youkaichao
4aba6e3d1a
[core] gemma2 full context length support ( #10584 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 20:13:54 -08:00
Tyler Michael Smith
978b39744b
[Misc] Add pynccl wrappers for all_gather and reduce_scatter ( #9432 )
2024-11-22 22:14:03 -05:00
Russell Bryant
ebda51968b
[Core] Fix broken log configuration ( #10458 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-23 10:23:51 +08:00
Travis Johnson
9195dbdbca
[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use ( #10164 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-11-23 10:17:38 +08:00
youkaichao
d559979c54
[bugfix] fix cpu tests ( #10585 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 17:34:03 -08:00
Zhonghua Deng
d345f409b7
[V1] EngineCore supports profiling ( #10564 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
2024-11-22 17:16:15 -08:00
Russell Bryant
28598f3939
[Core] remove temporary local variables in LLMEngine.__init__ ( #10577 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-22 16:22:53 -08:00
zixuanzhang226
948c859571
support bitsandbytes quantization with qwen model ( #10549 )
...
Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>
2024-11-22 16:16:14 -08:00
Ricky Xu
97814fbf0f
[v1] Refactor KVCacheManager for more hash input than token ids ( #10507 )
...
Signed-off-by: rickyx <rickyx@anyscale.com>
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-11-22 23:27:25 +00:00