Varun Sundar Rabindranath
|
25b79d9fd3
|
[V1] Input Batch Relocation (#10962)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-09 09:33:41 -08:00 |
|
wangxiyuan
|
aea2fc38c3
|
[Platform] Move async output check to platform (#10768)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-09 17:24:46 +00:00 |
|
Roger Wang
|
c690357928
|
[V1] Fix Detokenizer loading in AsyncLLM (#10997)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-09 16:27:10 +00:00 |
|
youkaichao
|
d1c2e15eb3
|
[torch.compile] add dynamo time tracking (#11005)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 23:09:04 -08:00 |
|
youkaichao
|
46004e83a2
|
[misc] clean up and unify logging (#10999)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 17:28:27 -08:00 |
|
youkaichao
|
43b05fa314
|
[torch.compile][misc] fix comments (#10993)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 11:18:18 -08:00 |
|
Roger Wang
|
a11f326528
|
[V1] Initial support of multimodal models for V1 re-arch (#10699)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-08 12:50:51 +00:00 |
|
youkaichao
|
fd57d2b534
|
[torch.compile] allow candidate compile sizes (#10984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 11:05:21 +00:00 |
|
youkaichao
|
7be15d9356
|
[core][misc] remove use_dummy driver for _run_workers (#10920)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-07 12:06:08 -08:00 |
|
youkaichao
|
1b62745b1d
|
[core][executor] simplify instance id (#10976)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-07 09:33:45 -08:00 |
|
Cyrus Leung
|
c889d5888b
|
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:20:49 +00:00 |
|
Cyrus Leung
|
39e227c7ae
|
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:10:05 +00:00 |
|
Cyrus Leung
|
bf0e382e16
|
[Model] Composite weight loading for multimodal Qwen2 (#10944)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 07:22:52 -07:00 |
|
Isotr0py
|
b26b4cd03c
|
[Misc][LoRA] Refactor and clean MergedQKVParallelLinearWithLora implementation (#10958)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-07 18:33:49 +08:00 |
|
Cyrus Leung
|
955fa9533a
|
[3/N] Support and implement merged input processor for LLaVA model (#10676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-07 00:50:58 -08:00 |
|
Russell Bryant
|
69d357ba12
|
[Core] Cleanup startup logging a bit (#10961)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-07 02:30:23 +00:00 |
|
youkaichao
|
dcdc3fafe5
|
[ci] fix broken tests (#10956)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-06 11:25:47 -08:00 |
|
youkaichao
|
c05cfb67da
|
[misc] fix typo (#10960)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-06 11:25:20 -08:00 |
|
Michael Goin
|
8b59631855
|
[Core] Support Lark grammars for XGrammar (#10870)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-06 08:34:29 -07:00 |
|
youkaichao
|
a1887f2c96
|
[torch.compile] fix deprecated code (#10948)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-06 11:01:23 +00:00 |
|
youkaichao
|
b031a455a9
|
[torch.compile] add logging for compilation time (#10941)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-06 10:07:15 +00:00 |
|
youkaichao
|
db87eb6c67
|
[torch.compile] use size tuning for specific sizes (#10933)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-05 20:30:41 -08:00 |
|
Konrad Zawora
|
a43065272f
|
[Misc][Gaudi] Avoid torch.compile and enable lazy collectives (#10897)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2024-12-05 08:47:46 -08:00 |
|
Jee Jee Li
|
571da8fc43
|
[Misc][LoRA] Clean up the function interface of Punica (#10917)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-05 13:22:28 +00:00 |
|
Jee Jee Li
|
1f958a7d52
|
[Bugfix] Fix BNB loader target_modules (#10720)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-05 13:20:26 +08:00 |
|
Cyrus Leung
|
aa39a8e175
|
[Doc] Create a new "Usage" section (#10827)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 11:19:35 +08:00 |
|
Michael Goin
|
8d370e91cb
|
[Bugfix] Fallback to outlines for complex json schemas (#10899)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-05 11:14:06 +08:00 |
|
Woosuk Kwon
|
2a56e1264f
|
[V1] Fix when max_model_len is not divisible by block_size (#10903)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-04 16:54:05 -08:00 |
|
Daniele
|
e4c34c23de
|
[CI/Build] improve python-only dev setup (#9621)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-12-04 21:48:13 +00:00 |
|
Isotr0py
|
10398b4706
|
[Model] Consolidate ViTs attention implementation without mask (#10893)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-04 18:11:08 +00:00 |
|
Xin Yang
|
01d079fd8e
|
[LoRA] Change lora_tokenizers capacity (#10796)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-12-04 17:40:16 +00:00 |
|
jianzheng
|
8db957ee3a
|
[bugfix] fixed parameter “n” when set parameter “bestof” > 1 (#10854)
Signed-off-by: jianzheng <57654625+o2363286@users.noreply.github.com>
|
2024-12-04 08:48:22 +00:00 |
|
wangxiyuan
|
b5b647b084
|
Drop ROCm load format check (#10767)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-04 04:32:21 +00:00 |
|
Gregory Shtrasberg
|
a061fe601e
|
[Build][Bugfix] Using the correct type hint (#10866)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2024-12-03 15:47:55 -05:00 |
|
tomeras91
|
7c32b6861e
|
[Frontend] correctly record prefill and decode time metrics (#10853)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-12-03 19:13:31 +00:00 |
|
Michael Goin
|
7090c27bb2
|
[Bugfix] Only require XGrammar on x86 (#10865)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 10:32:21 -08:00 |
|
Alexander Matveev
|
3bc94cab69
|
[V1] VLM - Run the mm_mapper preprocessor in the frontend process (#10640)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-03 10:33:10 +00:00 |
|
Yang Zheng
|
f6084f6324
|
[Speculative Decoding] Move indices to device before filtering output (#10850)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
|
2024-12-03 17:01:39 +08:00 |
|
Aaron Pham
|
9323a3153b
|
[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 15:17:00 +08:00 |
|
Cyrus Leung
|
3257d449fa
|
[Misc] Remove deprecated names (#10817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-03 06:52:57 +00:00 |
|
youkaichao
|
dc5ce861bf
|
[torch.compile] remove compilation_context and simplify code (#10838)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 06:19:02 +00:00 |
|
youkaichao
|
21fe7b481a
|
[core][distributed] add pynccl broadcast (#10843)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 04:53:23 +00:00 |
|
Jee Jee Li
|
a4cf256159
|
[Bugfix] Fix QKVParallelLinearWithShardedLora bias bug (#10844)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-03 12:10:29 +08:00 |
|
zixuanzhang226
|
d746268e92
|
[Model] support bitsandbytes quantization with minicpm model (#10842)
Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>
|
2024-12-03 03:06:41 +00:00 |
|
Isotr0py
|
4c05edb33a
|
[Model] Add TP and BNB quantization support to LlavaMultiModalProjector (#10834)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-02 23:06:09 +00:00 |
|
Jani Monoses
|
9b14d978aa
|
Fix openvino on GPU (#10793)
|
2024-12-02 18:52:19 +00:00 |
|
Yan Ma
|
519cc6ca12
|
[Misc][XPU] Avoid torch compile for XPU platform (#10747)
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 17:53:55 +00:00 |
|
Jee Jee Li
|
b45f0d7946
|
[Misc][LoRA] Move the implementation of lora bias to punica.py (#10829)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-02 17:53:36 +00:00 |
|
youkaichao
|
a4c4daf364
|
[misc] use out argument for flash attention (#10822)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 10:50:10 +00:00 |
|
wangxiyuan
|
995a148575
|
[doc]Update config docstring (#10732)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-02 04:14:45 +00:00 |
|