xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-10 12:25:32 +08:00

Author	SHA1	Message	Date
youkaichao	a7f65c2be9	[torch.compile] remove reset (#7975 )	2024-08-28 17:32:26 -07:00
youkaichao	ce6bf3a2cf	[torch.compile] avoid Dynamo guard evaluation overhead (#7898 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-08-28 16:10:12 -07:00
Mor Zusman	fdd9daafa3	[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM (#7651 )	2024-08-28 15:06:52 -07:00
rasmith	e5697d161c	[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ (#7386 )	2024-08-28 15:37:47 -04:00
Pavani Majety	b98cc28f91	[Core][Kernels] Use FlashInfer backend for FP8 KV Cache when available. (#7798 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-08-28 10:01:22 -07:00
Cody Yu	e3580537a4	[Performance] Enable chunked prefill and prefix caching together (#7753 )	2024-08-28 00:36:31 -07:00
Cyrus Leung	51f86bf487	[mypy][CI/Build] Fix mypy errors (#7929 )	2024-08-27 23:47:44 -07:00
Peter Salas	fab5f53e2d	[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902 )	2024-08-28 01:53:56 +00:00
zifeitong	5340a2dccf	[Model] Add multi-image input support for LLaVA-Next offline inference (#7230 )	2024-08-28 07:09:02 +08:00
Dipika Sikka	fc911880cc	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-27 15:07:09 -07:00
Isotr0py	9db642138b	[CI/Build][VLM] Cleanup multiple images inputs model test (#7897 )	2024-08-27 15:28:30 +00:00
Patrick von Platen	6fc4e6e07a	[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739 )	2024-08-27 12:40:02 +00:00
youkaichao	64cc644425	[core][torch.compile] discard the compile for profiling (#7796 )	2024-08-26 21:33:58 -07:00
Nick Hill	39178c7fbc	[Tests] Disable retries and use context manager for openai client (#7565 )	2024-08-26 21:33:17 -07:00
Megha Agarwal	2eedede875	[Core] Asynchronous Output Processor (#7049 ) Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>	2024-08-26 20:53:20 -07:00
Dipika Sikka	665304092d	[Misc] Update `qqq` to use vLLMParameters (#7805 )	2024-08-26 13:16:15 -06:00
Cody Yu	2deb029d11	[Performance][BlockManagerV2] Mark prefix cache block as computed after schedule (#7822 )	2024-08-26 11:24:53 -07:00
Cyrus Leung	029c71de11	[CI/Build] Avoid downloading all HF files in `RemoteOpenAIServer` (#7836 )	2024-08-26 05:31:10 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0b769992ec	[Bugfix]: Use float32 for base64 embedding (#7855 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2024-08-26 03:16:38 +00:00
Nick Hill	1856aff4d6	[Spec Decoding] Streamline batch expansion tensor manipulation (#7851 )	2024-08-25 15:45:14 -07:00
Isotr0py	2059b8d9ca	[Misc] Remove snapshot_download usage in InternVL2 test (#7835 )	2024-08-25 15:53:09 +00:00
Isotr0py	8aaf3d5347	[Model][VLM] Support multi-images inputs for Phi-3-vision models (#7783 )	2024-08-25 11:51:20 +00:00
zifeitong	80162c44b1	[Bugfix] Fix Phi-3v crash when input images are of certain sizes (#7840 )	2024-08-24 18:16:24 -07:00
youkaichao	aab0fcdb63	[ci][test] fix RemoteOpenAIServer (#7838 )	2024-08-24 17:31:28 +00:00
youkaichao	ea9fa160e3	[ci][test] exclude model download time in server start time (#7834 )	2024-08-24 01:03:27 -07:00
youkaichao	7d9ffa2ae1	[misc][core] lazy import outlines (#7831 )	2024-08-24 00:51:38 -07:00
Tyler Rockwood	d81abefd2e	[Frontend] add json_schema support from OpenAI protocol (#7654 )	2024-08-23 23:07:24 -07:00
Pooya Davoodi	8da48e4d95	[Frontend] Publish Prometheus metrics in run_batch API (#7641 )	2024-08-23 23:04:22 -07:00
Alexander Matveev	9db93de20c	[Core] Add multi-step support to LLMEngine (#7789 )	2024-08-23 12:45:53 -07:00
Dipika Sikka	f1df5dbfd6	[Misc] Update `marlin` to use vLLMParameters (#7803 )	2024-08-23 14:30:52 -04:00
Maximilien de Bayser	e25fee57c2	[BugFix] Fix server crash on empty prompt (#7746 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-08-23 13:12:44 +00:00
SangBin Cho	c01a6cb231	[Ray backend] Better error when pg topology is bad. (#7584 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-08-22 17:44:25 -07:00
Joe Runde	b903e1ba7f	[Frontend] error suppression cleanup (#7786 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 21:50:21 +00:00
Travis Johnson	cc0eaf12b1	[Bugfix] spec decode handle None entries in topk args in create_sequence_group_output (#7232 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-08-22 09:33:48 -04:00
Dipika Sikka	955b5191c9	[Misc] update fp8 to use `vLLMParameter` (#7437 )	2024-08-22 08:36:18 -04:00
Abhinav Goyal	a3fce56b88	[Speculative Decoding] EAGLE Implementation with Top-1 proposer (#6830 )	2024-08-22 02:42:24 -07:00
Michael Goin	aae74ef95c	Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527 )" (#7764 )	2024-08-22 03:42:14 +00:00
Joe Runde	cde9183b40	[Bug][Frontend] Improve ZMQ client robustness (#7443 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 02:18:11 +00:00
zifeitong	df1a21131d	[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710 )	2024-08-22 09:36:24 +08:00
Luka Govedič	7937009a7e	[Kernel] Replaced `blockReduce[...]` functions with `cub::BlockReduce` (#7233 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-21 20:18:00 -04:00
Dipika Sikka	8678a69ab5	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-21 16:17:10 -07:00
Peter Salas	1ca0d4f86b	[Model] Add UltravoxModel and UltravoxConfig (#7615 )	2024-08-21 22:49:39 +00:00
Robert Shaw	970dfdc01d	[Frontend] Improve Startup Failure UX (#7716 )	2024-08-21 19:53:01 +00:00
Robert Shaw	f7e3b0c5aa	[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend (#7394 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-21 13:34:14 -04:00
LI MOU	53328d7536	[BUG] fix crash on flashinfer backend with cudagraph disabled, when attention group_size not in [1,2,4,8] (#7509 )	2024-08-21 08:54:31 -07:00
Nick Hill	c75363fbc0	[BugFix] Avoid premature async generator exit and raise all exception variations (#7698 )	2024-08-21 11:45:55 -04:00
Cyrus Leung	baaedfdb2d	[mypy] Enable following imports for entrypoints (#7248 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Fei <dfdfcai4@gmail.com>	2024-08-20 23:28:21 -07:00
Isotr0py	12e1c65bc9	[Model] Add AWQ quantization support for InternVL2 model (#7187 )	2024-08-20 23:18:57 -07:00
youkaichao	9e51b6a626	[ci][test] adjust max wait time for cpu offloading test (#7709 )	2024-08-20 17:12:44 -07:00
Antoni Baum	3b682179dd	[Core] Add `AttentionState` abstraction (#7663 )	2024-08-20 18:50:45 +00:00

... 2 3 4 5 6 ...

845 Commits