xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-08-01 04:47:53 +08:00

Author	SHA1	Message	Date
kewang-xlnx	de0526f668	[Misc][Quark] Upstream Quark format to VLLM (#10765 ) Signed-off-by: kewang-xlnx <kewang@xilinx.com> Signed-off-by: kewang2 <kewang2@amd.com> Co-authored-by: kewang2 <kewang2@amd.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-15 11:05:15 -05:00
RunningLeon	97eb97b5a4	[Model]: Support internlm3 (#12037 )	2025-01-15 11:35:17 +00:00
wangxiyuan	3adf0ffda8	[Platform] Do not raise error if _Backend is not found (#12023 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-01-15 10:14:15 +00:00
Chen Zhang	994fc655b7	[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003 )	2025-01-15 07:55:30 +00:00
youkaichao	ad34c0df0f	[core] platform agnostic executor via collective_rpc (#11256 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-15 13:45:21 +08:00
Elfie Guo	0794e7446e	[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467 )	2025-01-15 12:47:49 +08:00
Jee Jee Li	42f5e7c52a	[Kernel] Support MulAndSilu (#11624 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-15 02:29:53 +00:00
Cyrus Leung	bb354e6b2d	[Bugfix] Fix various bugs in multi-modal processor (#12031 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-14 12:16:11 +00:00
Yangcheng Li	f7b3ba82c3	[MISC] fix typo in kv transfer send recv test (#11983 )	2025-01-13 05:07:48 +00:00
Robert Shaw	619ae268c3	[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction (#11973 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-01-13 04:54:10 +00:00
Isotr0py	d14e98d924	[Model] Support GGUF models newly added in `transformers` 4.46.0 (#9685 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-13 00:13:44 +00:00
Robert Shaw	9597a095f2	[V1][Core][1/n] Logging and Metrics (#11962 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-01-12 21:02:02 +00:00
Avshalom Manevich	263a870ee1	[Hardware][TPU] workaround fix for MoE on TPU (#11764 )	2025-01-12 10:53:51 -05:00
Akshat Tripathi	8bddb73512	[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-12 13:01:52 +00:00
Isotr0py	f967e51f38	[Model] Initialize support for Deepseek-VL2 models (#11578 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-12 00:17:24 -08:00
Nicolò Lucchesi	d697dc01b4	[Bugfix] Fix RobertaModel loading (#11940 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-01-11 14:05:09 +00:00
Cyrus Leung	a991f7d508	[Doc] Basic guide for writing unit tests for new models (#11951 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-11 21:27:24 +08:00
Cyrus Leung	7a3a83e3b8	[CI/Build] Move model-specific multi-modal processing tests (#11934 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-11 13:50:05 +08:00
youkaichao	899136b857	[ci] fix broken distributed-tests-4-gpus (#11937 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-11 09:07:24 +08:00
Li, Jiang	aa1e77a19c	[Hardware][CPU] Support MOE models on x86 CPU (#11831 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-01-10 11:07:58 -05:00
Harry Mellor	482cdc494e	[Doc] Rename offline inference examples (#11927 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 23:50:29 +08:00
youkaichao	241ad7b301	[ci] Fix sampler tests (#11922 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-10 20:45:33 +08:00
Harry Mellor	d85c47d6ad	Replace "online inference" with "online serving" (#11923 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 12:05:56 +00:00
Joe Runde	ac2f3f7fee	[Bugfix] Validate lora adapters to avoid crashing server (#11727 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-10 15:56:36 +08:00
Chen Zhang	cf5f000d21	[torch.compile] Hide KV cache behind torch.compile boundary (#11677 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-10 13:14:42 +08:00
Cyrus Leung	b844b99ad3	[VLM] Enable tokenized inputs for merged multi-modal processor (#11900 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-10 03:24:00 +00:00
Cyrus Leung	9a228348d2	[Misc] Provide correct Pixtral-HF chat template (#11891 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-09 10:19:37 -07:00
youkaichao	bd82872211	[ci]try to fix flaky multi-step tests (#11894 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-09 14:47:29 +00:00
wangxiyuan	405eb8e396	[platform] Allow platform specify attention backend (#11609 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-01-09 21:46:50 +08:00
Cyrus Leung	0bd1ff4346	[Bugfix] Override dunder methods of placeholder modules (#11882 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-09 09:02:53 +00:00
Maximilien de Bayser	1fe554bac3	treat do_lower_case in the same way as the sentence-transformers library (#11815 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-01-09 11:05:43 +08:00
Tyler Michael Smith	615e4a5401	[CI] Turn on basic correctness tests for V1 (#10864 )	2025-01-08 21:20:44 -05:00
Robert Shaw	56fe4c297c	[TPU][Quantization] TPU `W8A8` (#11785 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-08 19:33:29 +00:00
Harry Mellor	aba8d6ee00	[Doc] Move examples into categories (#11840 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 13:09:53 +00:00
Cyrus Leung	2a0596bc48	[VLM] Reorganize profiling/processing-related code (#11812 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-08 18:59:58 +08:00
youkaichao	889e662eae	[misc] improve memory profiling (#11809 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-08 06:36:03 +00:00
Cyrus Leung	8f37be38eb	[Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation (#11800 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-07 18:25:02 +08:00
Jee Jee Li	b278557935	[Kernel][LoRA]Punica prefill kernels fusion (#11234 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Co-authored-by: Zhonghua Deng <abatom@163.com>	2025-01-07 04:01:39 +00:00
Cyrus Leung	08fb75c72e	[Bugfix] Fix LLaVA-NeXT feature size precision error (for real) (#11772 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-07 01:10:54 +00:00
Roger Wang	91b361ae89	[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 19:58:16 +00:00
Chen Zhang	e20c92bb61	[Kernel] Move attn_type to Attention.__init__() (#11690 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-07 00:11:28 +08:00
Jee Jee Li	32c9eff2ff	[Bugfix][V1] Fix molmo text-only inputs (#11676 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-06 15:22:25 +00:00
Cyrus Leung	996357e480	[VLM] Separate out profiling-related logic (#11746 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 16:02:21 +08:00
Rui Qiao	022c5c6944	[V1] Refactor get_executor_cls (#11754 )	2025-01-06 07:59:16 +00:00
cennn	9e764e7b10	[distributed] remove pynccl's redundant change_state (#11749 )	2025-01-06 09:05:48 +08:00
cennn	635b897246	[distributed] remove pynccl's redundant stream (#11744 )	2025-01-05 23:09:11 +08:00
Jee Jee Li	47831430cc	[Bugfix][V1] Fix test_kv_cache_utils.py (#11738 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-04 16:07:59 +00:00
Cyrus Leung	ba214dffbe	[Bugfix] Fix precision error in LLaVA-NeXT (#11735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-04 23:45:57 +08:00
Cyrus Leung	eed11ebee9	[VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision (#11717 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-04 11:40:53 +00:00
Yan Burman	300acb8347	[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233 ) Signed-off-by: Yan Burman <yanburman@users.noreply.github.com> Signed-off-by: Ido Asraff <idoa@atero.ai>	2025-01-04 14:50:16 +08:00

... 3 4 5 6 7 ...

1441 Commits