xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-22 23:15:01 +08:00

Author	SHA1	Message	Date
youkaichao	c411def234	[torch.compile] fix shape specialization (#10722 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-27 10:16:10 -08:00
Chendi.Xue	0a71900bc9	Remove hard-dependencies of Speculative decode to CUDA workers (#10587 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2024-11-26 17:57:11 -08:00
Murali Andoorveedu	db66e018ea	[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Co-authored-by: Sourashis Roy <sroy@roblox.com>	2024-11-26 09:11:16 -08:00
Wallas Henrique	c27df94e1f	[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-11-25 12:23:32 -05:00
Cyrus Leung	ed46f14321	[Model] Support `is_causal` HF config field for Qwen2 model (#10621 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-25 09:51:20 +00:00
youkaichao	05d1f8c9c6	[misc] move functions to config.py (#10624 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-25 09:27:30 +00:00
youkaichao	25d806e953	[misc] add torch.compile compatibility check (#10618 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-24 23:40:08 -08:00
Mengqing Cao	7ea3cd7c3e	[Refactor][MISC] del redundant code in ParallelConfig.postinit (#10614 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2024-11-25 05:14:56 +00:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
kliuae	7c25fe45a6	[AMD] Add support for GGUF quantization on ROCm (#10254 )	2024-11-22 21:14:49 -08:00
Michael Goin	02a43f82a9	Update default max_num_batch_tokens for chunked prefill to 2048 (#10544 )	2024-11-22 21:14:19 -08:00
youkaichao	4aba6e3d1a	[core] gemma2 full context length support (#10584 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-22 20:13:54 -08:00
youkaichao	eebad39f26	[torch.compile] support all attention backends (#10558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-22 14:04:42 -08:00
youkaichao	a111d0151f	[platforms] absorb worker cls difference into platforms folder (#10555 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2024-11-21 21:00:32 -08:00
Woosuk Kwon	f9310cbd0c	[V1] Fix Compilation config & Enable CUDA graph by default (#10528 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-21 12:53:39 -08:00
youkaichao	aaddce5d26	[platforms] improve error message for unspecified platforms (#10520 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-20 23:07:56 -08:00
Luka Govedič	8b0fe06c89	[torch.compile] Inductor code caching fix (#10273 ) Signed-off-by: luka <luka@neuralmagic.com> Signed-off-by: Luka Govedic <luka.govedic@gmail.com>	2024-11-20 21:44:57 -08:00
Mengqing Cao	9d827170a3	[Platforms] Add `device_type` in `Platform` (#10508 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2024-11-21 04:44:20 +00:00
youkaichao	388ee3de66	[torch.compile] limit inductor threads and lazy import quant (#10482 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-20 18:36:33 -08:00
youkaichao	0cd3d9717e	[7/N] torch.compile, reduce compilation time (#10460 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-20 11:20:38 -08:00
youkaichao	803f37eaaa	[6/N] torch.compile rollout to users (#10437 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-19 10:09:03 -08:00
youkaichao	a03ea40792	[3/N][torch.compile] consolidate custom op logging (#10399 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-18 15:14:59 -08:00
youkaichao	51bb12d17b	[4/N][torch.compile] clean up set_torch_compile_backend (#10401 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-17 23:57:20 -08:00
youkaichao	4fd9375028	[2/N][torch.compile] make compilation cfg part of vllm cfg (#10383 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-16 18:02:14 -08:00
Cyrus Leung	32e46e000f	[Frontend] Automatic detection of chat content format from AST (#9919 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-16 13:35:40 +08:00
Cyrus Leung	2ac6d0e75b	[Misc] Consolidate pooler config overrides (#10351 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-15 06:59:00 +00:00
Cyrus Leung	972112d82f	[Bugfix] Fix unable to load some models (#10312 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-14 16:55:54 -08:00
Cyrus Leung	0b8bb86bf1	[1/N] Initial prototype for multi-modal processor (#10044 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-13 12:39:03 +00:00
Umesh	8a06428c70	[LoRA] Adds support for bias in LoRA (#5733 ) Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com> Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>	2024-11-12 11:08:40 -08:00
youkaichao	eea55cca5b	[1/N] torch.compile user interface design (#10237 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 18:01:06 -08:00
Russell Bryant	9cdba9669c	[Doc] Update help text for `--distributed-executor-backend` (#10231 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-12 09:55:09 +08:00
Robert Shaw	6ace6fba2c	[V1] `AsyncLLM` Implementation (#9826 ) Signed-off-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-11-11 23:05:38 +00:00
Cyrus Leung	51c2e1fcef	[CI/Build] Split up models tests (#10069 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-09 11:39:14 -08:00
Krishna Mandal	b09895a618	[Frontend][Core] Override HF `config.json` via CLI (#5836 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-09 16:19:27 +00:00
sroy745	f6778620a9	Disable spec-decode + chunked-prefill for draft models with tensor parallelism > 1 (#10136 ) Signed-off-by: Sourashis Roy <sroy@roblox.com>	2024-11-08 15:56:18 +00:00
Nicolò Lucchesi	9d43afcc53	[Feature] [Spec decode]: Combine chunked prefill with speculative decoding (#9291 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2024-11-07 08:15:14 -08:00
Flávia Béo	aa9078fa03	Adds method to read the pooling types from model's files (#9506 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2024-11-07 08:42:40 +00:00
Cyrus Leung	db7db4aab9	[Misc] Consolidate ModelConfig code related to HF config (#10104 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-07 06:00:21 +00:00
Konrad Zawora	a02a50e6e5	[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143 ) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: Bob Zhu <bob.zhu@intel.com> Signed-off-by: zehao-intel <zehao.huang@intel.com> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai> Co-authored-by: Michal Adamczyk <madamczyk@habana.ai> Co-authored-by: Marceli Fylcek <mfylcek@habana.ai> Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com> Co-authored-by: Vivek Goel <vgoel@habana.ai> Co-authored-by: yuwenzho <yuwen.zhou@intel.com> Co-authored-by: Dominika Olszewska <dolszewska@habana.ai> Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com> Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com> Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com> Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai> Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com> Co-authored-by: Ilia Taraban <tarabanil@gmail.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai> Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai> Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com> Co-authored-by: Sun Choi <schoi@habana.ai> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com> Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com> Co-authored-by: Zehao Huang <zehao.huang@intel.com> Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com> Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com> Co-authored-by: Nir David <ndavid@habana.ai> Co-authored-by: Yu-Zhou <yu.zhou@intel.com> Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai> Co-authored-by: Karol Damaszke <kdamaszke@habana.ai> Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Jacek Czaja <jacek.czaja@intel.com> Co-authored-by: Jacek Czaja <jczaja@habana.ai> Co-authored-by: Yuan <yuan.zhou@outlook.com>	2024-11-06 01:09:10 -08:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
youkaichao	8d72bb20fa	[4/N] make quant config first-class citizen (#9978 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-04 08:51:31 -08:00
Chauncey	ac6b8f19b9	[Frontend] Multi-Modality Support for Loading Local Image Files (#9915 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2024-11-04 15:34:57 +00:00
youkaichao	e893795443	[2/N] executor pass the complete config to worker/modelrunner (#9938 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2024-11-02 07:35:05 -07:00
Roger Wang	3ea2dc2ec4	[Misc] Remove deprecated arg for cuda graph capture (#9864 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-10-31 07:22:07 +00:00
Went-Liang	81f09cfd80	[Model] Support math-shepherd-mistral-7b-prm model (#9697 ) Signed-off-by: Went-Liang <wenteng_liang@163.com>	2024-10-30 09:33:42 -07:00
科英	74fc2d77ae	[Misc] Add metrics for request queue time, forward time, and execute time (#9659 )	2024-10-29 10:32:56 -07:00
wangshuai09	4e2d95e372	[Hardware][ROCM] using current_platform.is_rocm (#9642 ) Signed-off-by: wangshuai09 <391746016@qq.com>	2024-10-28 04:07:00 +00:00
Mengqing Cao	5cbdccd151	[Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716 )	2024-10-26 10:59:06 +00:00
Vinay R Damodaran	33bab41060	[Bugfix]: Make chat content text allow type content (#9358 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com>	2024-10-24 05:05:49 +00:00
Mengqing Cao	2394962d70	[Hardware][XPU] using current_platform.is_xpu (#9605 )	2024-10-23 08:28:21 +00:00

1 2 3 4 5 ...

361 Commits