xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-10 21:57:24 +08:00

Author	SHA1	Message	Date
Cyrus Leung	f22619fe96	[Misc] Remove user-facing error for removed VLM args (#9104 )	2024-10-06 01:33:52 -07:00
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
TJian	23fea8714a	[Bugfix] Fix try-catch conditions to import correct Flash Attention Backend in Draft Model (#9101 )	2024-10-06 13:00:04 +08:00
youkaichao	f4dd830e09	[core] use forward context for flash infer (#9097 )	2024-10-05 19:37:31 -07:00
Andy Dai	5df1834895	[Bugfix] Fix order of arguments matters in config.yaml (#8960 )	2024-10-05 17:35:11 +00:00
Chen Zhang	cfadb9c687	[Bugfix] Deprecate registration of custom configs to huggingface (#9083 )	2024-10-05 21:56:40 +08:00
Xin Yang	15986f598c	[Model] Support Gemma2 embedding model (#9004 )	2024-10-05 06:57:05 +00:00
hhzhang16	53b3a33027	[Bugfix] Fixes Phi3v & Ultravox Multimodal EmbeddingInputs (#8979 )	2024-10-04 22:05:37 -07:00
Chen Zhang	dac914b0d6	[Bugfix] use blockmanagerv1 for encoder-decoder (#9084 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-10-05 04:45:38 +00:00
Zhuohan Li	a95354a36e	[Doc] Update README.md with Ray summit slides (#9088 )	2024-10-05 02:54:45 +00:00
youkaichao	663874e048	[torch.compile] improve allreduce registration (#9061 )	2024-10-04 16:43:50 -07:00
Chongming Ni	cc90419e89	[Hardware][Neuron] Add on-device sampling support for Neuron (#8746 ) Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>	2024-10-04 16:42:20 -07:00
Cody Yu	27302dd584	[Misc] Fix CI lint (#9085 )	2024-10-04 16:07:54 -07:00
Andy Dai	0cc566ca8f	[Misc] Add random seed for prefix cache benchmark (#9081 )	2024-10-04 21:58:57 +00:00
Andy Dai	05c531be47	[Misc] Improved prefix cache example (#9077 )	2024-10-04 21:38:42 +00:00
Kuntai Du	fbb74420e7	[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412 )	2024-10-04 14:01:44 -07:00
ElizaWszola	05d686432f	[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973 ) Co-authored-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu>	2024-10-04 12:34:44 -06:00
Flávia Béo	0dcc8cbe5a	Adds truncate_prompt_tokens param for embeddings creation (#8999 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com>	2024-10-04 18:31:40 +00:00
Roger Wang	26aa325f4f	[Core][VLM] Test registration for OOT multimodal models (#8717 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:38:25 -07:00
Varad Ahirwadkar	e5dc713c23	[Hardware][PowerPC] Make oneDNN dependency optional for Power (#9039 ) Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com>	2024-10-04 17:24:42 +00:00
Simon Mo	36eecfbddb	Remove AMD Ray Summit Banner (#9075 )	2024-10-04 10:17:16 -07:00
Prashant Gupta	9ade8bbc8d	[Model] add a bunch of supported lora modules for mixtral (#9008 ) Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>	2024-10-04 16:24:40 +00:00
Lucas Wilkinson	22482e495e	[Bugfix] Flash attention arches not getting set properly (#9062 )	2024-10-04 09:43:15 -06:00
whyiug	3d826d2c52	[Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL (#9071 )	2024-10-04 14:34:58 +00:00
Cyrus Leung	0e36fd4909	[Misc] Move registry to its own file (#9064 )	2024-10-04 10:01:37 +00:00
Murali Andoorveedu	0f6d7a9a34	[Models] Add remaining model PP support (#7168 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:56:58 +08:00
Michael Goin	303d44790a	[Misc] Enable multi-step output streaming by default (#9047 )	2024-10-03 22:55:42 -04:00
Lucas Wilkinson	aeb37c2a72	[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845 )	2024-10-03 22:55:25 -04:00
代君	3dbb215b38	[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405 )	2024-10-04 10:36:39 +08:00
Domen Vreš	2838d6b38e	[Bugfix] Weight loading fix for OPT model (#9042 ) Co-authored-by: dvres <dvres@fri.uni-lj.si>	2024-10-03 19:53:29 -04:00
sroy745	91add85ec4	Fix failing spec decode test (#9054 )	2024-10-03 23:07:29 +00:00
youkaichao	9aaf14c62e	[misc] add forward context for attention (#9029 )	2024-10-03 12:09:42 -07:00
xendo	63e39937f9	[Frontend] [Neuron] Parse literals out of override-neuron-config (#8959 ) Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>	2024-10-03 18:02:07 +00:00
sroy745	f5d72b2fc6	[Core] Make BlockSpaceManagerV2 the default BlockManager to use. (#8678 )	2024-10-03 09:44:21 -07:00
Guillaume Calmettes	83caf35e08	[BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser (#9020 )	2024-10-03 16:44:52 +08:00
Divakar Verma	01843c89b8	[Misc] log when using default MoE config (#8971 )	2024-10-03 04:31:07 +00:00
Travis Johnson	19a4dd0990	[Bugfix] example template should not add parallel_tool_prompt if tools is none (#9007 )	2024-10-03 03:04:17 +00:00
Nick Hill	18c2e30c57	[Doc] Update Granite model docs (#9025 )	2024-10-03 02:42:24 +00:00
Shawn Tan	19f0d25796	[Model] Adding Granite MoE. (#8206 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-03 09:33:57 +08:00
Sergey Shlyapnikov	f58d4fccc9	[OpenVINO] Enable GPU support for OpenVINO vLLM backend (#8192 )	2024-10-02 17:50:01 -04:00
Varun Sundar Rabindranath	afb050b29d	[Core] CUDA Graphs for Multi-Step + Chunked-Prefill (#8645 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-10-02 19:44:39 +00:00
Alex Brooks	7f60520deb	[Misc] Update Default Image Mapper Error Log (#8977 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-10-02 11:44:38 +00:00
afeldman-nm	563649aafe	[Core] Combined support for multi-step scheduling, chunked prefill & prefix caching (#8804 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Andrew Feldman <afeld2012@gmail.com>	2024-10-02 07:52:20 +00:00
Lily Liu	1570203864	[Spec Decode] (1/2) Remove batch expansion (#8839 )	2024-10-01 16:04:42 -07:00
vlsav	22f5851b80	Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (#8997 )	2024-10-01 11:07:06 -07:00
Cyrus Leung	4f341bd4bf	[Doc] Update list of supported models (#8987 )	2024-10-02 00:35:39 +08:00
Sebastian Schoennenbeck	35bd215168	[Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (#8965 )	2024-10-01 09:58:06 +00:00
Alex Brooks	1fe0a4264a	[Bugfix] Fix Token IDs Reference for MiniCPM-V When Images are Provided With No Placeholders (#8991 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-01 09:52:44 +00:00
Isotr0py	bc4eb65b54	[Bugfix] Fix Fuyu tensor parallel inference (#8986 )	2024-10-01 17:51:41 +08:00
Divakar Verma	82f3937e59	[Misc] add process_weights_after_loading for DummyLoader (#8969 )	2024-10-01 03:46:41 +00:00

1 2 3 4 5 ...

2900 Commits