Woosuk Kwon
|
a79b122400
|
[V1] Do not allocate beyond the max_model_len (#10730)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-28 00:13:15 -08:00 |
|
Ricky Xu
|
d9b4b3f069
|
[Bug][CLI] Allow users to disable prefix caching explicitly (#10724)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-27 23:59:28 -08:00 |
|
罗泽轩
|
278be671a3
|
[Doc] Update model in arch_overview.rst to match comment (#10701)
Signed-off-by: spacewander <spacewanderlzx@gmail.com>
|
2024-11-27 23:58:39 -08:00 |
|
zixuanzhang226
|
70dc14fbd0
|
[Model] support bitsandbytes quantization with minicpm3 model (#10682)
Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>
|
2024-11-27 23:58:02 -08:00 |
|
youkaichao
|
cb4e1c3f3a
|
[misc] upgrade filelock version (#10731)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-27 19:54:58 -08:00 |
|
tomeras91
|
395b1c7454
|
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server (#10635)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-11-27 13:21:10 -08:00 |
|
Cyrus Leung
|
9b4b150395
|
[Bugfix] Ignore lm_head when loading embedding models (#10719)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-27 19:05:29 +00:00 |
|
Mor Zusman
|
197b4484a3
|
[Bugfix][Mamba] Fix Multistep on Mamba-like models (#10705)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
|
2024-11-27 19:02:27 +00:00 |
|
Isotr0py
|
b98c62ba49
|
[Bugfix] Fix GGUF inference with FP16 unquantized checkpoint (#10675)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-27 10:43:17 -08:00 |
|
youkaichao
|
c411def234
|
[torch.compile] fix shape specialization (#10722)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-27 10:16:10 -08:00 |
|
youkaichao
|
308cc5e21e
|
[ci] fix slow tests (#10698)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-27 09:26:14 -08:00 |
|
Roger Wang
|
9e0a147d50
|
[V1] Update interface for mistral-format Pixtral (#10703)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-27 12:26:27 +00:00 |
|
Li, Jiang
|
418cb3b93f
|
[Bugfix][Hardware][CPU] Fix intel-omp version to avoid segfault (#10700)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-11-27 11:55:38 +00:00 |
|
shunxing12345
|
1209261e93
|
[Model] Support telechat2 (#10311)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-11-27 11:32:35 +00:00 |
|
Tyler Michael Smith
|
e2251109c7
|
[Kernel] Remove if-else with identical branches in marlin 2:4 (#10687)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-26 22:55:32 -08:00 |
|
Jee Jee Li
|
15cc2a9f1a
|
[Misc]Further reduce BNB static variable (#10597)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-26 22:54:12 -08:00 |
|
Kunshang Ji
|
e85250b1d1
|
[Hardware][Gaudi]add get_name method for HPUAttentionBackend (#10667)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2024-11-26 22:49:40 -08:00 |
|
yansh97
|
cfb3bf25fb
|
[bugfix] fix the default value of llm_int8_threshold in BitsAndBytesConfig (#10657)
|
2024-11-27 13:55:23 +08:00 |
|
jeongin601
|
1bf905ddaa
|
[Bugfix][SpecDecode] apply sampling parameters to target probabilities for consistency in rejection sampling. (#10198)
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeong_in.bae <jeong_in.bae@navercorp.com>
|
2024-11-27 05:07:30 +00:00 |
|
Roger Wang
|
0a4d968500
|
[V1] Update interface for idefics3 (#10680)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-27 10:04:01 +08:00 |
|
Chendi.Xue
|
0a71900bc9
|
Remove hard-dependencies of Speculative decode to CUDA workers (#10587)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2024-11-26 17:57:11 -08:00 |
|
Roger Wang
|
2f0a0a17a4
|
[V1] Refactor model executable interface for multimodal models (#10570)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-26 20:46:11 +00:00 |
|
Michael Goin
|
7576cd38df
|
[Bugfix] Check bnb_4bit_quant_storage for bitsandbytes (#10642)
|
2024-11-26 12:29:00 -08:00 |
|
Michael Goin
|
9a99273b48
|
[Bugfix] Fix using -O[0,3] with LLM entrypoint (#10677)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-26 10:44:01 -08:00 |
|
Conroy Cheers
|
f5792c7c4a
|
[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson (#9735)
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
|
2024-11-26 10:26:28 -08:00 |
|
Murali Andoorveedu
|
db66e018ea
|
[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Co-authored-by: Sourashis Roy <sroy@roblox.com>
|
2024-11-26 09:11:16 -08:00 |
|
Kunshang Ji
|
1f6584ee85
|
[V1] Enable profile for LLMEngine (#10665)
|
2024-11-26 10:36:45 +00:00 |
|
youkaichao
|
334d64d1e8
|
[ci] add vllm_test_utils (#10659)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-26 00:20:04 -08:00 |
|
Cyrus Leung
|
940635343a
|
[Misc] Remove outdated init protocols (#10655)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-26 14:55:00 +08:00 |
|
Sage Moore
|
9a88f89799
|
custom allreduce + torch.compile (#10121)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 22:00:16 -08:00 |
|
Ricky Xu
|
519e8e4182
|
[v1] EngineArgs for better config handling for v1 (#10382)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-25 21:09:43 -08:00 |
|
Sanket Kale
|
a6760f6456
|
[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228)
Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-11-25 18:32:39 -08:00 |
|
youkaichao
|
45ac4ff270
|
[bugfix] fix aria model and add torch.compile (#10645)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 18:32:09 -08:00 |
|
youkaichao
|
6e9ff050c8
|
[misc] do not read HOST_IP (#10644)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 17:04:50 -08:00 |
|
Shane A
|
9db713a1dc
|
[Model] Add OLMo November 2024 model (#10503)
|
2024-11-25 17:26:40 -05:00 |
|
Cyrus Leung
|
1b583cfefa
|
[Doc] Fix typos in docs (#10636)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 10:15:45 -08:00 |
|
Cyrus Leung
|
cf73f0c95e
|
[Model] Enable optional prefix when loading embedding models (#10639)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 18:14:33 +00:00 |
|
zhou fan
|
b1d920531f
|
[Model]: Add support for Aria model (#10514)
Signed-off-by: xffxff <1247714429@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-11-25 18:10:55 +00:00 |
|
Simon Mo
|
452a4e80c3
|
[Docs] Add Snowflake Slides (#10641)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2024-11-25 09:34:46 -08:00 |
|
Wallas Henrique
|
c27df94e1f
|
[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-11-25 12:23:32 -05:00 |
|
Chauncey
|
d04b13a380
|
[Bug]: Authorization ignored when root_path is set (#10606)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-25 16:21:41 +00:00 |
|
fzyzcjy
|
2b0879bfc2
|
Super tiny little typo fix (#10633)
|
2024-11-25 13:08:30 +00:00 |
|
Cyrus Leung
|
ed46f14321
|
[Model] Support is_causal HF config field for Qwen2 model (#10621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 09:51:20 +00:00 |
|
youkaichao
|
05d1f8c9c6
|
[misc] move functions to config.py (#10624)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 09:27:30 +00:00 |
|
youkaichao
|
25d806e953
|
[misc] add torch.compile compatibility check (#10618)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-24 23:40:08 -08:00 |
|
youkaichao
|
65813781a2
|
[torch.compile] add warning for unsupported models (#10622)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-24 23:27:51 -08:00 |
|
Jee Jee Li
|
7c2134beda
|
[torch.compile] force inductor threads (#10620)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-24 23:04:21 -08:00 |
|
Cyrus Leung
|
a30a605d21
|
[Doc] Add encoder-based models to Supported Models page (#10616)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 06:34:07 +00:00 |
|
youkaichao
|
571841b7fc
|
[torch.compile] support encoder based models (#10613)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 05:24:33 +00:00 |
|
Mengqing Cao
|
7ea3cd7c3e
|
[Refactor][MISC] del redundant code in ParallelConfig.postinit (#10614)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2024-11-25 05:14:56 +00:00 |
|