xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-10 09:16:06 +08:00

Author	SHA1	Message	Date
Woosuk Kwon	8bab4959be	[Misc] Remove VLLM_BUILD_WITH_NEURON env variable (#5389 )	2024-06-11 00:37:56 -07:00
Roger Wang	3c4cebf751	[Doc][Typo] Fixing Missing Comma (#5403 )	2024-06-11 00:20:28 -07:00
youkaichao	d8f31f2f8b	[Doc] add debugging tips (#5409 )	2024-06-10 23:21:43 -07:00
Cyrus Leung	640052b069	[Bugfix][Frontend] Cleanup "fix chat logprobs" (#5026 )	2024-06-10 22:36:46 -07:00
maor-ps	351d5e7b82	[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (#5312 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-06-11 10:30:31 +08:00
Nick Hill	a008629807	[Misc] Various simplifications and typing fixes (#5368 )	2024-06-11 10:29:02 +08:00
Kevin H. Luu	76477a93b7	[ci] Fix Buildkite agent path (#5392 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-10 18:58:07 -07:00
Michael Goin	77c87beb06	[Doc] Add documentation for FP8 W8A8 (#5388 )	2024-06-10 18:55:12 -06:00
Simon Mo	114332b88e	Bump version to v0.5.0 (#5384 )	2024-06-10 15:56:06 -07:00
Woosuk Kwon	cb77ad836f	[Docs] Alphabetically sort sponsors (#5386 )	2024-06-10 15:17:19 -05:00
Roger Wang	856c990041	[Docs] Add Docs on Limitations of VLM Support (#5383 )	2024-06-10 09:53:50 -07:00
Kevin H. Luu	c5602f0baa	[ci] Mount buildkite agent on Docker container to upload benchmark results (#5330 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-10 09:22:34 -07:00
Kevin H. Luu	f7f9c5f97b	[ci] Use small_cpu_queue for doc build (#5331 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-10 09:21:11 -07:00
Cyrus Leung	2c0d933594	[Bugfix] Fix LLaVA-NeXT (#5380 )	2024-06-10 15:38:47 +00:00
Itay Etelis	774d1035e4	[Feature][Frontend]: Continued `stream_options` implementation also in CompletionRequest (#5319 )	2024-06-10 14:22:09 +00:00
Cyrus Leung	6b29d6fe70	[Model] Initial support for LLaVA-NeXT (#4199 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-10 12:47:15 +00:00
Cyrus Leung	0bfa1c4f13	[Misc] Improve error message when LoRA parsing fails (#5194 )	2024-06-10 19:38:49 +08:00
youkaichao	c81da5f56d	[misc][typo] fix typo (#5372 )	2024-06-10 09:51:02 +00:00
Roger Wang	68bc81703e	[Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server (#5374 )	2024-06-10 09:13:39 +00:00
Dipika Sikka	5884c2b454	[Misc] Update to comply with the new `compressed-tensors` config (#5350 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-06-10 03:49:46 +00:00
Bla_ckB	45f92c00cf	[Bugfix] Fix KeyError: 1 When Using LoRA adapters (#5164 )	2024-06-09 16:23:14 -07:00
bnellnm	5467ac3196	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
youkaichao	5d7e3d0176	[mis][ci/test] fix flaky test in test_sharded_state_loader.py (#5361 ) [mis][ci/test] fix flaky test in tests/test_sharded_state_loader.py (#5361)	2024-06-09 03:50:14 +00:00
youkaichao	0373e1837e	[Core][CUDA Graph] add output buffer for cudagraph (#5074 ) [Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint (#5074)	2024-06-08 19:14:43 -07:00
Michael Goin	c09dade2a2	[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale (#5353 )	2024-06-08 13:54:05 -04:00
youkaichao	8ea5e44a43	[CI/Test] improve robustness of test (vllm_runner) (#5357 ) [CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357)	2024-06-08 08:59:20 +00:00
youkaichao	9fb900f90c	[CI/Test] improve robustness of test (hf_runner) (#5347 ) [CI/Test] improve robustness of test by replacing del with context manager (hf_runner) (#5347)	2024-06-07 22:31:32 -07:00
Hongxia Yang	c96fc06747	[ROCm][AMD] Use pytorch sdpa math backend to do naive attention (#4965 )	2024-06-07 19:13:12 -07:00
Benjamin Kitor	b3376e5c76	[Misc] Add args for selecting distributed executor to benchmarks (#5335 )	2024-06-08 09:20:16 +08:00
Cheng Li	e69ded7d1c	[Bug Fix] Fix the support check for FP8 CUTLASS (#5352 ) Bug description: With torch 2.4.0.dev20240603+cu121, cutlass_fp8_supported outputs False, and the (capability, version) before the comparison is (90, 11111111112) This PR fixes the support check for FP8 CUTLASS ( cutlass_fp8_supported) which was introduced in https://github.com/vllm-project/vllm/pull/5183.	2024-06-08 00:42:05 +00:00
Calvinn Ng	767c727a81	fix DbrxFusedNormAttention missing cache_config (#5340 ) Co-authored-by: team <calvinn.ng@ahrefs.com>	2024-06-07 14:10:21 -07:00
Jie Fu (傅杰)	6840a71610	[Misc] Remove unused cuda_utils.h in CPU backend (#5345 )	2024-06-07 14:09:13 -07:00
Roger Wang	7a9cb294ae	[Frontend] Add OpenAI Vision API Support (#5237 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-06-07 11:23:32 -07:00
Dipika Sikka	ca3ea51bde	[Kernel] Dynamic Per-Token Activation Quantization (#5037 ) Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-06-07 09:36:26 -07:00
limingshu	dc49fb892c	Addition of lacked ignored_seq_groups in _schedule_chunked_prefill (#5296 )	2024-06-07 13:35:42 +00:00
Antoni Baum	18a277b52d	Remove Ray health check (#4693 )	2024-06-07 10:01:56 +00:00
Tyler Michael Smith	8d75fe48ca	[Kernel] Switch fp8 layers to use the CUTLASS kernels (#5183 ) Switching from torch._scaled_mm to vLLM's cutlass fp8 kernels when supported as we are seeing 5-15% improvement in e2e performance on neuralmagic/Meta-Llama-3-8B-Instruct-FP8 see https://docs.google.com/spreadsheets/d/1GiAnmzyGHgZ6zL_LDSTm35Bdrt4A8AaFEurDlISYYA4/ for some quick e2e benchmarks and #5144 for comparisons across different GEMM sizes.	2024-06-07 08:42:35 +00:00
youkaichao	388596c914	[Misc][Utils] allow get_open_port to be called for multiple times (#5333 )	2024-06-06 22:15:11 -07:00
Itay Etelis	baa15a9ec3	[Feature][Frontend]: Add support for `stream_options` in `ChatCompletionRequest` (#5135 )	2024-06-07 03:29:24 +00:00
Jie Fu (傅杰)	15063741e3	[Misc] Missing error message for custom ops import (#5282 )	2024-06-06 20:17:21 -07:00
Antoni Baum	ccdc490dda	[Core] Change LoRA embedding sharding to support loading methods (#5038 )	2024-06-06 19:07:57 -07:00
Antoni Baum	a31cab7556	[Core] Avoid copying prompt/output tokens if no penalties are used (#5289 )	2024-06-06 18:12:00 -07:00
Matthew Goldey	828da0d44e	[Frontend] enable passing multiple LoRA adapters at once to generate() (#5300 )	2024-06-06 15:48:13 -05:00
Philipp Moritz	abe855d637	[Kernel] Retune Mixtral 8x22b configs for FP8 on H100 (#5294 )	2024-06-06 09:29:29 -07:00
liuyhwangyh	4efff036f0	Bugfix: fix broken of download models from modelscope (#5233 ) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>	2024-06-06 09:28:10 -07:00
Cyrus Leung	89c920785f	[CI/Build] Update vision tests (#5307 )	2024-06-06 05:17:18 -05:00
Breno Faria	7b0a0dfb22	[Frontend][Core] Update Outlines Integration from `FSM` to `Guide` (#4109 ) Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Breno Faria <breno.faria@intrafind.com>	2024-06-05 16:49:12 -07:00
Simon Mo	3a6ae1d33c	[CI] Disable flash_attn backend for spec decode (#5286 )	2024-06-05 15:49:27 -07:00
Simon Mo	8f1729b829	[Docs] Add Ray Summit CFP (#5295 )	2024-06-05 15:25:18 -07:00
Woosuk Kwon	6a7c7711a2	[Misc] Skip for logits_scale == 1.0 (#5291 )	2024-06-05 15:19:02 -07:00

1 2 3 4 5 ...

1544 Commits