xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-02 16:27:17 +08:00

Author	SHA1	Message	Date
Cyrus Leung	3f674a49b5	[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126 )	2024-08-14 17:55:42 +00:00
Wallas Henrique	70b746efcf	[Misc] Deprecation Warning when setting --engine-use-ray (#7424 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: youkaichao <youkaichao@126.com>	2024-08-14 09:44:27 -07:00
jack	67d115db08	[Bugfix][Frontend] Disable embedding API for chat models (#7504 ) Co-authored-by: jack <jack@alex>	2024-08-14 09:15:19 -07:00
youkaichao	d3d9cb6e4b	[ci] fix model tests (#7507 )	2024-08-14 01:01:43 -07:00
Chang Su	c134a46402	Fix empty output when temp is too low (#2937 ) Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-08-14 05:31:44 +00:00
youkaichao	199adbb7cf	[doc] update test script to include cudagraph (#7501 )	2024-08-13 21:52:58 -07:00
Cyrus Leung	dd164d72f3	[Bugfix][Docs] Update list of mock imports (#7493 )	2024-08-13 20:37:30 -07:00
youkaichao	ea49e6a3c8	[misc][ci] fix cpu test with plugins (#7489 )	2024-08-13 19:27:46 -07:00
Jee Jee Li	97992802f3	[CI/Build]Reduce the time consumption for LoRA tests (#7396 )	2024-08-13 17:27:29 -07:00
Woosuk Kwon	59edd0f134	[Bugfix][CI] Import ray under guard (#7486 )	2024-08-13 17:12:58 -07:00
Woosuk Kwon	a08df8322e	[TPU] Support multi-host inference (#7457 )	2024-08-13 16:31:20 -07:00
youkaichao	16422ea76f	[misc][plugin] add plugin system implementation (#7426 )	2024-08-13 16:24:17 -07:00
Kyle Sayers	373538f973	[Misc] `compressed-tensors` code reuse (#7277 )	2024-08-13 19:05:15 -04:00
youkaichao	33e5d7e6b6	[frontend] spawn engine process from api server process (#7484 )	2024-08-13 15:40:17 -07:00
Simon Mo	c5c7768264	Announce NVIDIA Meetup (#7483 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-08-13 14:28:36 -07:00
Dipika Sikka	b1e5afc3e7	[Misc] Update `awq` and `awq_marlin` to use `vLLMParameters` (#7422 )	2024-08-13 17:08:20 -04:00
Dipika Sikka	d3bdfd3ab9	[Misc] Update Fused MoE weight loading (#7334 )	2024-08-13 14:57:45 -04:00
Dipika Sikka	fb377d7e74	[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281 )	2024-08-13 14:30:11 -04:00
Dipika Sikka	181abbc27d	[Misc] Update LM Eval Tolerance (#7473 )	2024-08-13 14:28:14 -04:00
Peter Salas	00c3d68e45	[Frontend][Core] Add plumbing to support audio language models (#7446 )	2024-08-13 17:39:33 +00:00
Woosuk Kwon	e20233d361	Revert "[Doc] Update supported_hardware.rst (#7276 )" (#7467 )	2024-08-13 01:37:08 -07:00
Woosuk Kwon	d6e634f3d7	[TPU] Suppress import custom_ops warning (#7458 )	2024-08-13 00:30:30 -07:00
youkaichao	4d2dc5072b	[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102 )	2024-08-13 00:16:42 -07:00
Cyrus Leung	7025b11d94	[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410 )	2024-08-13 05:33:41 +00:00
Kevin H. Luu	5469146bcc	[ci] Remove fast check cancel workflow (#7455 )	2024-08-12 21:19:51 -07:00
Andrew Wang	97a6be95ba	[Misc] improve logits processors logging message (#7435 )	2024-08-13 02:29:34 +00:00
Cyrus Leung	9ba85bc152	[mypy] Misc. typing improvements (#7417 )	2024-08-13 09:20:20 +08:00
Rui Qiao	198d6a2898	[Core] Shut down aDAG workers with clean async llm engine exit (#7224 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-12 17:57:16 -07:00
Daniele	774cd1d3bf	[CI/Build] bump minimum cmake version (#6999 )	2024-08-12 16:29:20 -07:00
sasha0552	91294d56e1	[Bugfix] Handle PackageNotFoundError when checking for xpu version (#7398 )	2024-08-12 16:07:20 -07:00
jon-chuang	a046f86397	[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-08-12 22:47:41 +00:00
Cyrus Leung	4ddc4743d7	[Core] Consolidate `GB` constant and enable float GB arguments (#7416 )	2024-08-12 14:14:14 -07:00
Lucas Wilkinson	6aa33cb2dd	[Misc] Use scalar type to dispatch to different `gptq_marlin` kernels (#7323 )	2024-08-12 14:40:13 -04:00
Kevin H. Luu	1137f343aa	[ci] Cancel fastcheck when PR is ready (#7433 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-12 10:59:14 -07:00
Kevin H. Luu	9b3e2edd30	[ci] Cancel fastcheck run when PR is marked ready (#7427 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-12 10:56:52 -07:00
Kevin H. Luu	65950e8f58	[ci] Entrypoints run upon changes in vllm/ (#7423 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-12 10:18:03 -07:00
Woosuk Kwon	cfba4def5d	[Bugfix] Fix logit soft cap in flash-attn backend (#7425 )	2024-08-12 09:58:28 -07:00
Daniele	d2bc4510a4	[CI/Build] bump Dockerfile.neuron image base, use public ECR (#6832 )	2024-08-12 09:53:35 -07:00
Cyrus Leung	24154f8618	[Frontend] Disallow passing `model` as both argument and option (#7347 )	2024-08-12 12:58:34 +00:00
Roger Wang	e6e42e4b17	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
Lily Liu	ec2affa8ae	[Kernel] Flashinfer correctness fix for v0.1.3 (#7319 )	2024-08-12 07:59:17 +00:00
Roger Wang	86ab567bae	[CI/Build] Minor refactoring for vLLM assets (#7407 )	2024-08-12 02:41:52 +00:00
Simon Mo	f020a6297e	[Docs] Update readme (#7316 )	2024-08-11 17:13:37 -07:00
youkaichao	6c8e595710	[misc] add commit id in collect env (#7405 )	2024-08-11 15:40:48 -07:00
tomeras91	02b1988b9f	[Doc] building vLLM with VLLM_TARGET_DEVICE=empty (#7403 )	2024-08-11 14:38:17 -07:00
tomeras91	386087970a	[CI/Build] build on empty device for better dev experience (#4773 )	2024-08-11 13:09:44 -07:00
William Lin	c08e2b3086	[core] [2/N] refactor worker_base input preparation for multi-step (#7387 )	2024-08-11 08:50:08 -07:00
Noam Gat	4fb7b52a2c	Updating LM Format Enforcer version to v0.10.6 (#7189 )	2024-08-11 08:11:50 -04:00
Woosuk Kwon	90bab18f24	[TPU] Use mark_dynamic to reduce compilation time (#7340 )	2024-08-10 18:12:22 -07:00
Isotr0py	4c5d8e8ea9	[Bugfix] Fix phi3v batch inference when images have different aspect ratio (#7392 )	2024-08-10 16:19:33 +00:00

1 2 3 4 5 ...

2318 Commits