kewang-xlnx
de0526f668
[Misc][Quark] Upstream Quark format to VLLM ( #10765 )
...
Signed-off-by: kewang-xlnx <kewang@xilinx.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-15 11:05:15 -05:00
RunningLeon
97eb97b5a4
[Model]: Support internlm3 ( #12037 )
2025-01-15 11:35:17 +00:00
wangxiyuan
3adf0ffda8
[Platform] Do not raise error if _Backend is not found ( #12023 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-01-15 10:14:15 +00:00
Chen Zhang
994fc655b7
[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager ( #12003 )
2025-01-15 07:55:30 +00:00
youkaichao
ad34c0df0f
[core] platform agnostic executor via collective_rpc ( #11256 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-15 13:45:21 +08:00
Elfie Guo
0794e7446e
[Misc] Add multipstep chunked-prefill support for FlashInfer ( #10467 )
2025-01-15 12:47:49 +08:00
Jee Jee Li
42f5e7c52a
[Kernel] Support MulAndSilu ( #11624 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-15 02:29:53 +00:00
Cyrus Leung
bb354e6b2d
[Bugfix] Fix various bugs in multi-modal processor ( #12031 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-14 12:16:11 +00:00
Yangcheng Li
f7b3ba82c3
[MISC] fix typo in kv transfer send recv test ( #11983 )
2025-01-13 05:07:48 +00:00
Robert Shaw
619ae268c3
[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction ( #11973 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-01-13 04:54:10 +00:00
Isotr0py
d14e98d924
[Model] Support GGUF models newly added in transformers 4.46.0 ( #9685 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-13 00:13:44 +00:00
Robert Shaw
9597a095f2
[V1][Core][1/n] Logging and Metrics ( #11962 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-01-12 21:02:02 +00:00
Avshalom Manevich
263a870ee1
[Hardware][TPU] workaround fix for MoE on TPU ( #11764 )
2025-01-12 10:53:51 -05:00
Akshat Tripathi
8bddb73512
[Hardware][CPU] Multi-LoRA implementation for the CPU backend ( #11100 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-12 13:01:52 +00:00
Isotr0py
f967e51f38
[Model] Initialize support for Deepseek-VL2 models ( #11578 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-12 00:17:24 -08:00
Nicolò Lucchesi
d697dc01b4
[Bugfix] Fix RobertaModel loading ( #11940 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-01-11 14:05:09 +00:00
Cyrus Leung
a991f7d508
[Doc] Basic guide for writing unit tests for new models ( #11951 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-11 21:27:24 +08:00
Cyrus Leung
7a3a83e3b8
[CI/Build] Move model-specific multi-modal processing tests ( #11934 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-11 13:50:05 +08:00
youkaichao
899136b857
[ci] fix broken distributed-tests-4-gpus ( #11937 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-11 09:07:24 +08:00
Li, Jiang
aa1e77a19c
[Hardware][CPU] Support MOE models on x86 CPU ( #11831 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-01-10 11:07:58 -05:00
Harry Mellor
482cdc494e
[Doc] Rename offline inference examples ( #11927 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-10 23:50:29 +08:00
youkaichao
241ad7b301
[ci] Fix sampler tests ( #11922 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-10 20:45:33 +08:00
Harry Mellor
d85c47d6ad
Replace "online inference" with "online serving" ( #11923 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-10 12:05:56 +00:00
Joe Runde
ac2f3f7fee
[Bugfix] Validate lora adapters to avoid crashing server ( #11727 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-10 15:56:36 +08:00
Chen Zhang
cf5f000d21
[torch.compile] Hide KV cache behind torch.compile boundary ( #11677 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-10 13:14:42 +08:00
Cyrus Leung
b844b99ad3
[VLM] Enable tokenized inputs for merged multi-modal processor ( #11900 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-10 03:24:00 +00:00
Cyrus Leung
9a228348d2
[Misc] Provide correct Pixtral-HF chat template ( #11891 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-09 10:19:37 -07:00
youkaichao
bd82872211
[ci]try to fix flaky multi-step tests ( #11894 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-09 14:47:29 +00:00
wangxiyuan
405eb8e396
[platform] Allow platform specify attention backend ( #11609 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-01-09 21:46:50 +08:00
Cyrus Leung
0bd1ff4346
[Bugfix] Override dunder methods of placeholder modules ( #11882 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-09 09:02:53 +00:00
Maximilien de Bayser
1fe554bac3
treat do_lower_case in the same way as the sentence-transformers library ( #11815 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-01-09 11:05:43 +08:00
Tyler Michael Smith
615e4a5401
[CI] Turn on basic correctness tests for V1 ( #10864 )
2025-01-08 21:20:44 -05:00
Robert Shaw
56fe4c297c
[TPU][Quantization] TPU W8A8 ( #11785 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-08 19:33:29 +00:00
Harry Mellor
aba8d6ee00
[Doc] Move examples into categories ( #11840 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-08 13:09:53 +00:00
Cyrus Leung
2a0596bc48
[VLM] Reorganize profiling/processing-related code ( #11812 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-08 18:59:58 +08:00
youkaichao
889e662eae
[misc] improve memory profiling ( #11809 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-08 06:36:03 +00:00
Cyrus Leung
8f37be38eb
[Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation ( #11800 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-07 18:25:02 +08:00
Jee Jee Li
b278557935
[Kernel][LoRA]Punica prefill kernels fusion ( #11234 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Co-authored-by: Zhonghua Deng <abatom@163.com>
2025-01-07 04:01:39 +00:00
Cyrus Leung
08fb75c72e
[Bugfix] Fix LLaVA-NeXT feature size precision error (for real) ( #11772 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-07 01:10:54 +00:00
Roger Wang
91b361ae89
[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision ( #11685 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-06 19:58:16 +00:00
Chen Zhang
e20c92bb61
[Kernel] Move attn_type to Attention.__init__() ( #11690 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-07 00:11:28 +08:00
Jee Jee Li
32c9eff2ff
[Bugfix][V1] Fix molmo text-only inputs ( #11676 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-06 15:22:25 +00:00
Cyrus Leung
996357e480
[VLM] Separate out profiling-related logic ( #11746 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-06 16:02:21 +08:00
Rui Qiao
022c5c6944
[V1] Refactor get_executor_cls ( #11754 )
2025-01-06 07:59:16 +00:00
cennn
9e764e7b10
[distributed] remove pynccl's redundant change_state ( #11749 )
2025-01-06 09:05:48 +08:00
cennn
635b897246
[distributed] remove pynccl's redundant stream ( #11744 )
2025-01-05 23:09:11 +08:00
Jee Jee Li
47831430cc
[Bugfix][V1] Fix test_kv_cache_utils.py ( #11738 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-04 16:07:59 +00:00
Cyrus Leung
ba214dffbe
[Bugfix] Fix precision error in LLaVA-NeXT ( #11735 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-04 23:45:57 +08:00
Cyrus Leung
eed11ebee9
[VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision ( #11717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-04 11:40:53 +00:00
Yan Burman
300acb8347
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture ( #11233 )
...
Signed-off-by: Yan Burman <yanburman@users.noreply.github.com>
Signed-off-by: Ido Asraff <idoa@atero.ai>
2025-01-04 14:50:16 +08:00