Aurick Qiao
|
d5c5154fcf
|
[Misc] LoRA + Chunked Prefill (#9057)
|
2024-12-11 10:09:20 +08:00 |
|
Jee Jee Li
|
d05f88679b
|
[Misc][LoRA] Add PEFTHelper for LoRA (#11003)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-10 11:12:01 +00:00 |
|
youkaichao
|
ebf778061d
|
monitor metrics of tokens per step using cudagraph batchsizes (#11031)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-09 22:35:36 -08:00 |
|
Tyler Michael Smith
|
28b3a1c7e5
|
[V1] Multiprocessing Tensor Parallel Support for v1 (#9856)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-10 06:28:14 +00:00 |
|
Isotr0py
|
a811dd6608
|
[Model] merged input processor for Phi-3-Vision models (#10977)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-09 12:55:10 -08:00 |
|
Jee Jee Li
|
ca871491ed
|
[Misc][LoRA] Abstract PunicaWrapper (#10955)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-09 12:54:44 -08:00 |
|
youkaichao
|
fd57d2b534
|
[torch.compile] allow candidate compile sizes (#10984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 11:05:21 +00:00 |
|
zhou fan
|
78029b34ed
|
[BugFix][Kernel]: fix illegal memory access in causal_conv1d when conv_states is None (#10928)
Signed-off-by: xffxff <1247714429@qq.com>
|
2024-12-08 01:21:18 +08:00 |
|
Cyrus Leung
|
c889d5888b
|
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:20:49 +00:00 |
|
Cyrus Leung
|
39e227c7ae
|
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:10:05 +00:00 |
|
Cyrus Leung
|
955fa9533a
|
[3/N] Support and implement merged input processor for LLaVA model (#10676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-07 00:50:58 -08:00 |
|
Cyrus Leung
|
222f5b082a
|
[CI/Build] Fix broken multimodal test (#10950)
|
2024-12-06 10:41:23 +00:00 |
|
youkaichao
|
9743d64e4e
|
[ci][build] add tests for python only compilation (#10915)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-05 08:54:47 -08:00 |
|
Isotr0py
|
998eeafe58
|
[CI/Build] Bump test transformers version (#10106)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 16:05:52 +00:00 |
|
Jee Jee Li
|
571da8fc43
|
[Misc][LoRA] Clean up the function interface of Punica (#10917)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-05 13:22:28 +00:00 |
|
Michael Goin
|
8d370e91cb
|
[Bugfix] Fallback to outlines for complex json schemas (#10899)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-05 11:14:06 +08:00 |
|
Xin Yang
|
01d079fd8e
|
[LoRA] Change lora_tokenizers capacity (#10796)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-12-04 17:40:16 +00:00 |
|
Tyler Michael Smith
|
d2bd88b122
|
[CI/Build] Replace mean with torch.all in test_pynccl.py (#10876)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-04 03:23:21 +00:00 |
|
Alexander Matveev
|
3bc94cab69
|
[V1] VLM - Run the mm_mapper preprocessor in the frontend process (#10640)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-03 10:33:10 +00:00 |
|
Aaron Pham
|
9323a3153b
|
[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 15:17:00 +08:00 |
|
youkaichao
|
dc5ce861bf
|
[torch.compile] remove compilation_context and simplify code (#10838)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 06:19:02 +00:00 |
|
youkaichao
|
21fe7b481a
|
[core][distributed] add pynccl broadcast (#10843)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 04:53:23 +00:00 |
|
Jee Jee Li
|
b45f0d7946
|
[Misc][LoRA] Move the implementation of lora bias to punica.py (#10829)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-02 17:53:36 +00:00 |
|
zhou fan
|
ef31eabc68
|
[Model]: add some tests for aria model (#10770)
Signed-off-by: xffxff <1247714429@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-02 05:36:36 +00:00 |
|
Woosuk Kwon
|
073a4bd1c0
|
[Kernel] Use out arg in flash_attn_varlen_func (#10811)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-01 17:55:39 -08:00 |
|
Kuntai Du
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|
Cyrus Leung
|
d2f058e76c
|
[Misc] Rename embedding classes to pooling (#10801)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 14:36:51 +08:00 |
|
Cyrus Leung
|
133707123e
|
[Model] Replace embedding models with pooling adapter (#10769)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 08:02:54 +08:00 |
|
Cyrus Leung
|
fa6ecb9aa7
|
[Model] Clean up MiniCPMV (#10751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-29 04:47:06 +00:00 |
|
sixgod
|
5fc5ce0fe4
|
[Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-11-28 14:53:31 +00:00 |
|
Woosuk Kwon
|
a79b122400
|
[V1] Do not allocate beyond the max_model_len (#10730)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-28 00:13:15 -08:00 |
|
Ricky Xu
|
d9b4b3f069
|
[Bug][CLI] Allow users to disable prefix caching explicitly (#10724)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-27 23:59:28 -08:00 |
|
tomeras91
|
395b1c7454
|
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server (#10635)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-11-27 13:21:10 -08:00 |
|
Mor Zusman
|
197b4484a3
|
[Bugfix][Mamba] Fix Multistep on Mamba-like models (#10705)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
|
2024-11-27 19:02:27 +00:00 |
|
youkaichao
|
308cc5e21e
|
[ci] fix slow tests (#10698)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-27 09:26:14 -08:00 |
|
shunxing12345
|
1209261e93
|
[Model] Support telechat2 (#10311)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-11-27 11:32:35 +00:00 |
|
jeongin601
|
1bf905ddaa
|
[Bugfix][SpecDecode] apply sampling parameters to target probabilities for consistency in rejection sampling. (#10198)
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeong_in.bae <jeong_in.bae@navercorp.com>
|
2024-11-27 05:07:30 +00:00 |
|
Chendi.Xue
|
0a71900bc9
|
Remove hard-dependencies of Speculative decode to CUDA workers (#10587)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2024-11-26 17:57:11 -08:00 |
|
Murali Andoorveedu
|
db66e018ea
|
[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Co-authored-by: Sourashis Roy <sroy@roblox.com>
|
2024-11-26 09:11:16 -08:00 |
|
youkaichao
|
334d64d1e8
|
[ci] add vllm_test_utils (#10659)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-26 00:20:04 -08:00 |
|
Sage Moore
|
9a88f89799
|
custom allreduce + torch.compile (#10121)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 22:00:16 -08:00 |
|
Ricky Xu
|
519e8e4182
|
[v1] EngineArgs for better config handling for v1 (#10382)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-25 21:09:43 -08:00 |
|
Shane A
|
9db713a1dc
|
[Model] Add OLMo November 2024 model (#10503)
|
2024-11-25 17:26:40 -05:00 |
|
zhou fan
|
b1d920531f
|
[Model]: Add support for Aria model (#10514)
Signed-off-by: xffxff <1247714429@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-11-25 18:10:55 +00:00 |
|
Wallas Henrique
|
c27df94e1f
|
[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-11-25 12:23:32 -05:00 |
|
Chauncey
|
d04b13a380
|
[Bug]: Authorization ignored when root_path is set (#10606)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-25 16:21:41 +00:00 |
|
Cyrus Leung
|
ed46f14321
|
[Model] Support is_causal HF config field for Qwen2 model (#10621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 09:51:20 +00:00 |
|
youkaichao
|
05d1f8c9c6
|
[misc] move functions to config.py (#10624)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 09:27:30 +00:00 |
|
youkaichao
|
25d806e953
|
[misc] add torch.compile compatibility check (#10618)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-24 23:40:08 -08:00 |
|
youkaichao
|
571841b7fc
|
[torch.compile] support encoder based models (#10613)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 05:24:33 +00:00 |
|