xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-23 12:44:27 +08:00

Author	SHA1	Message	Date
Rafael Vasquez	de24046fcd	[Doc] Improve contributing and installation documentation (#9132 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-08 20:22:08 +00:00
Sayak Paul	1874c6a1b0	[Doc] Update vlm.rst to include an example on videos (#9155 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-10-08 18:12:29 +00:00
Daniele	9a94ca4a5d	[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537 )	2024-10-08 09:38:40 -07:00
Peter Pan	cfba685bd4	[CI/Build] Add examples folder into Docker image so that we can leverage the templates*.jinja when serving models (#8758 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2024-10-08 09:37:34 -07:00
Alex Brooks	069d3bd8d0	[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:31:26 +00:00
Alex Brooks	a3691b6b5e	[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:12:56 +00:00
Brendan Wong	8c746226c9	[Frontend] API support for beam search for MQLLMEngine (#9117 )	2024-10-08 05:51:43 +00:00
youkaichao	e1faa2a598	[misc] improve ux on readme (#9147 )	2024-10-07 22:26:25 -07:00
Kunshang Ji	80b57f00d5	[Intel GPU] Fix xpu decode input (#9145 )	2024-10-08 03:51:14 +00:00
youkaichao	04c12f8157	[misc] update utils to support comparing multiple settings (#9140 )	2024-10-08 02:51:49 +00:00
Simon Mo	8eeb857084	Add Slack to README (#9137 )	2024-10-07 17:06:21 -07:00
youkaichao	fa45513a51	[misc] fix comment and variable name (#9139 )	2024-10-07 16:07:05 -07:00
Kuntai Du	c0d9a98d0c	[Doc] Include performance benchmark in README (#9135 )	2024-10-07 15:04:06 -07:00
Russell Bryant	e0dbdb013d	[CI/Build] Add linting for github actions workflows (#7876 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-07 21:18:10 +00:00
TimWang	93cf74a8a7	[Doc]: Add deploying_with_k8s guide (#8451 )	2024-10-07 13:31:45 -07:00
Cyrus Leung	151ef4efd2	[Model] Support NVLM-D and fix QK Norm in InternViT (#9045 ) Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2024-10-07 11:55:12 +00:00
Isotr0py	f19da64871	[Core] Refactor GGUF parameters packing and forwarding (#8859 )	2024-10-07 10:01:46 +00:00
Isotr0py	4f95ffee6f	[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089 )	2024-10-07 06:50:35 +00:00
Cyrus Leung	8c6de96ea1	[Model] Explicit interface for vLLM models and support OOT embedding models (#9108 )	2024-10-07 06:10:35 +00:00
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
sroy745	c8f26bb636	[BugFix][Core] Fix BlockManagerV2 when Encoder Input is None (#9103 )	2024-10-07 03:52:42 +00:00
Isotr0py	487678d046	[Bugfix][Hardware][CPU] Fix CPU model input for decode (#9044 )	2024-10-06 19:14:27 -07:00
Varun Sundar Rabindranath	cb3b2b9ba4	[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-10-06 12:48:11 -07:00
Yanyi Liu	fdf59d30ea	[Bugfix] fix tool_parser error handling when serve a model not support it (#8709 )	2024-10-06 12:51:08 +00:00
Cyrus Leung	b22b798471	[Model] PP support for embedding models and update docs (#9090 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-10-06 16:35:27 +08:00
Cyrus Leung	f22619fe96	[Misc] Remove user-facing error for removed VLM args (#9104 )	2024-10-06 01:33:52 -07:00
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
TJian	23fea8714a	[Bugfix] Fix try-catch conditions to import correct Flash Attention Backend in Draft Model (#9101 )	2024-10-06 13:00:04 +08:00
youkaichao	f4dd830e09	[core] use forward context for flash infer (#9097 )	2024-10-05 19:37:31 -07:00
Andy Dai	5df1834895	[Bugfix] Fix order of arguments matters in config.yaml (#8960 )	2024-10-05 17:35:11 +00:00
Chen Zhang	cfadb9c687	[Bugfix] Deprecate registration of custom configs to huggingface (#9083 )	2024-10-05 21:56:40 +08:00
Xin Yang	15986f598c	[Model] Support Gemma2 embedding model (#9004 )	2024-10-05 06:57:05 +00:00
hhzhang16	53b3a33027	[Bugfix] Fixes Phi3v & Ultravox Multimodal EmbeddingInputs (#8979 )	2024-10-04 22:05:37 -07:00
Chen Zhang	dac914b0d6	[Bugfix] use blockmanagerv1 for encoder-decoder (#9084 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-10-05 04:45:38 +00:00
Zhuohan Li	a95354a36e	[Doc] Update README.md with Ray summit slides (#9088 )	2024-10-05 02:54:45 +00:00
youkaichao	663874e048	[torch.compile] improve allreduce registration (#9061 )	2024-10-04 16:43:50 -07:00
Chongming Ni	cc90419e89	[Hardware][Neuron] Add on-device sampling support for Neuron (#8746 ) Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>	2024-10-04 16:42:20 -07:00
Cody Yu	27302dd584	[Misc] Fix CI lint (#9085 )	2024-10-04 16:07:54 -07:00
Andy Dai	0cc566ca8f	[Misc] Add random seed for prefix cache benchmark (#9081 )	2024-10-04 21:58:57 +00:00
Andy Dai	05c531be47	[Misc] Improved prefix cache example (#9077 )	2024-10-04 21:38:42 +00:00
Kuntai Du	fbb74420e7	[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412 )	2024-10-04 14:01:44 -07:00
ElizaWszola	05d686432f	[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973 ) Co-authored-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu>	2024-10-04 12:34:44 -06:00
Flávia Béo	0dcc8cbe5a	Adds truncate_prompt_tokens param for embeddings creation (#8999 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com>	2024-10-04 18:31:40 +00:00
Roger Wang	26aa325f4f	[Core][VLM] Test registration for OOT multimodal models (#8717 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:38:25 -07:00
Varad Ahirwadkar	e5dc713c23	[Hardware][PowerPC] Make oneDNN dependency optional for Power (#9039 ) Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com>	2024-10-04 17:24:42 +00:00
Simon Mo	36eecfbddb	Remove AMD Ray Summit Banner (#9075 )	2024-10-04 10:17:16 -07:00
Prashant Gupta	9ade8bbc8d	[Model] add a bunch of supported lora modules for mixtral (#9008 ) Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>	2024-10-04 16:24:40 +00:00
Lucas Wilkinson	22482e495e	[Bugfix] Flash attention arches not getting set properly (#9062 )	2024-10-04 09:43:15 -06:00
whyiug	3d826d2c52	[Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL (#9071 )	2024-10-04 14:34:58 +00:00
Cyrus Leung	0e36fd4909	[Misc] Move registry to its own file (#9064 )	2024-10-04 10:01:37 +00:00

1 2 3 4 5 ...

2925 Commits