xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-26 05:37:04 +08:00

Author	SHA1	Message	Date
William Lin	1d5e397aa4	[Core/Bugfix] pass VLLM_ATTENTION_BACKEND to ray workers (#8172 )	2024-09-10 23:46:08 +00:00
Alexander Matveev	22f3a4bc6c	[Bugfix] lookahead block table with cuda graph max capture (#8340 ) [Bugfix] Ensure multistep lookahead allocation is compatible with cuda graph max capture (#8340)	2024-09-10 16:00:35 -07:00
Cody Yu	b1f3e18958	[MISC] Keep chunked prefill enabled by default with long context when prefix caching is enabled (#8342 )	2024-09-10 22:28:28 +00:00
Prashant Gupta	04e7c4e771	[Misc] remove peft as dependency for prompt models (#8162 )	2024-09-10 17:21:56 -04:00
Kevin Lin	5faedf1b62	[Spec Decode] Move ops.advance_step to flash attn advance_step (#8224 )	2024-09-10 13:18:14 -07:00
Cyrus Leung	8c054b7a62	[Frontend] Clean up type annotations for mistral tokenizer (#8314 )	2024-09-10 16:49:11 +00:00
Cyrus Leung	da1a844e61	[Bugfix] Fix missing `post_layernorm` in CLIP (#8155 )	2024-09-10 08:22:50 +00:00
Dipika Sikka	6cd5e5b07e	[Misc] Fused MoE Marlin support for GPTQ (#8217 )	2024-09-09 23:02:52 -04:00
Kyle Sayers	c7cb5c3335	[Misc] GPTQ Activation Ordering (#8135 )	2024-09-09 16:27:26 -04:00
Vladislav Kruglikov	f9b4a2d415	[Bugfix] Correct adapter usage for cohere and jamba (#8292 )	2024-09-09 11:20:46 -07:00
Adam Lugowski	58fcc8545a	[Frontend] Add progress reporting to run_batch.py (#8060 ) Co-authored-by: Adam Lugowski <adam.lugowski@parasail.io>	2024-09-09 11:16:37 -07:00
Kyle Mistele	08287ef675	[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (#8272 )	2024-09-09 10:45:11 -04:00
Alexander Matveev	4ef41b8476	[Bugfix] Fix async postprocessor in case of preemption (#8267 )	2024-09-07 21:01:51 -07:00
Isotr0py	36bf8150cc	[Model][VLM] Decouple weight loading logic for `Paligemma` (#8269 )	2024-09-07 17:45:44 +00:00
Isotr0py	e807125936	[Model][VLM] Support multi-images inputs for InternVL2 models (#8201 )	2024-09-07 16:38:23 +08:00
Cyrus Leung	9f68e00d27	[Bugfix] Fix broken OpenAI tensorizer test (#8258 )	2024-09-07 08:02:39 +00:00
youkaichao	ce2702a923	[tpu][misc] fix typo (#8260 )	2024-09-06 22:40:46 -07:00
Cyrus Leung	2f707fcb35	[Model] Multi-input support for LLaVA (#8238 )	2024-09-07 02:57:24 +00:00
William Lin	12dd715807	[misc] [doc] [frontend] LLM torch profiler support (#7943 )	2024-09-06 17:48:48 -07:00
Patrick von Platen	29f49cd6e3	[Model] Allow loading from original Mistral format (#8168 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-06 17:02:05 -06:00
Dipika Sikka	23f322297f	[Misc] Remove `SqueezeLLM` (#8220 )	2024-09-06 16:29:03 -06:00
rasmith	9db52eab3d	[Kernel] [Triton] Memory optimization for awq_gemm and awq_dequantize, 2x throughput (#8248 )	2024-09-06 16:26:09 -06:00
Rui Qiao	de80783b69	[Misc] Use ray[adag] dependency instead of cuda (#7938 )	2024-09-06 09:18:35 -07:00
Nick Hill	baa5467547	[BugFix] Fix Granite model configuration (#8216 )	2024-09-06 11:39:29 +08:00
Jiaxin Shan	db3bf7c991	[Core] Support load and unload LoRA in api server (#6566 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-09-05 18:10:33 -07:00
Michael Goin	2ee45281a5	Move verify_marlin_supported to GPTQMarlinLinearMethod (#8165 )	2024-09-05 11:09:46 -04:00
Alex Brooks	9da25a88aa	[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-05 12:48:10 +00:00
manikandan.tm@zucisystems.com	8685ba1a1e	Inclusion of InternVLChatModel In PP_SUPPORTED_MODELS(Pipeline Parallelism) (#7860 )	2024-09-05 11:33:37 +00:00
Elfie Guo	e39ebf5cf5	[Core/Bugfix] Add query dtype as per FlashInfer API requirements. (#8173 )	2024-09-05 05:12:26 +00:00
Woosuk Kwon	4624d98dbd	[Misc] Clean up RoPE forward_native (#8076 )	2024-09-04 20:31:48 -07:00
Simon Mo	32e7db2536	Bump version to v0.6.0 (#8166 )	2024-09-04 16:34:27 -07:00
Harsha vardhan manoj Bikki	008cf886c9	[Neuron] Adding support for adding/ overriding neuron configuration a… (#8062 ) Co-authored-by: Harsha Bikki <harbikh@amazon.com>	2024-09-04 16:33:43 -07:00
Kyle Mistele	e02ce498be	[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649 ) Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by: Kyle Mistele <kyle@constellate.ai>	2024-09-04 13:18:13 -07:00
wnma	d3311562fb	[Bugfix] remove post_layernorm in siglip (#8106 )	2024-09-04 18:55:37 +08:00
Cyrus Leung	855c262a6b	[Frontend] Multimodal support in offline chat (#8098 )	2024-09-04 05:22:17 +00:00
Peter Salas	2be8ec6e71	[Model] Add Ultravox support for multiple audio chunks (#7963 )	2024-09-04 04:38:21 +00:00
Dipika Sikka	e16fa99a6a	[Misc] Update fbgemmfp8 to use `vLLMParameters` (#7972 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-03 20:12:41 -06:00
Woosuk Kwon	61f4a93d14	[TPU][Bugfix] Use XLA rank for persistent cache path (#8137 )	2024-09-03 18:35:33 -07:00
Nick Hill	d4db9f53c8	[Benchmark] Add `--async-engine` option to benchmark_throughput.py (#7964 )	2024-09-03 20:57:41 -04:00
Dipika Sikka	2188a60c7e	[Misc] Update `GPTQ` to use `vLLMParameters` (#7976 )	2024-09-03 17:21:44 -04:00
Woosuk Kwon	0af3abe3d3	[TPU][Bugfix] Fix next_token_ids shape (#8128 )	2024-09-03 13:29:24 -07:00
Antoni Baum	652c83b697	[Misc] Raise a more informative exception in add/remove_logger (#7750 )	2024-09-03 12:28:25 -07:00
Alexander Matveev	6d646d08a2	[Core] Optimize Async + Multi-step (#8050 )	2024-09-03 18:50:29 +00:00
Isotr0py	ec266536b7	[Bugfix][VLM] Add fallback to SDPA for ViT model running on CPU backend (#8061 )	2024-09-03 21:37:52 +08:00
Woosuk Kwon	0fbc6696c2	[Bugfix] Fix single output condition in output processor (#7881 )	2024-09-02 20:35:42 -07:00
wang.yuqi	6e36f4fa6c	improve chunked prefill performance [Bugfix] Fix #7592 vllm 0.5.4 enable_chunked_prefill throughput is slightly lower than 0.5.3~0.5.0. (#7874)	2024-09-02 14:20:12 -07:00
Isotr0py	dd2a6a82e3	[Bugfix] Fix internlm2 tensor parallel inference (#8055 )	2024-09-02 23:48:56 +08:00
Isotr0py	4ca65a9763	[Core][Bugfix] Accept GGUF model without .gguf extension (#8056 )	2024-09-02 08:43:26 -04:00
Woosuk Kwon	e2b2aa5a0f	[TPU] Align worker index with node boundary (#7932 )	2024-09-01 23:09:46 -07:00
Lily Liu	e6a26ed037	[SpecDecode][Kernel] Flashinfer Rejection Sampling (#7244 )	2024-09-01 21:23:29 -07:00

1 2 3 4 5 ...

1673 Commits