xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-15 01:54:29 +08:00

Author	SHA1	Message	Date
saumya-saran	b28298f2f4	[Bugfix] Validate SamplingParam n is an int (#8548 )	2024-09-20 12:46:02 -07:00
Niklas Muennighoff	3b63de9353	[Model] Add OLMoE (#7922 )	2024-09-20 09:31:41 -07:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
William Lin	9e5ec35b1f	[bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (#8474 )	2024-09-19 20:49:54 -07:00
Amit Garg	18ae428a0d	[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571 )	2024-09-20 08:54:02 +08:00
盏一	e42c634acb	[Core] simplify logits resort in _apply_top_k_top_p (#8619 )	2024-09-19 18:28:25 +00:00
Charlie Fu	9cc373f390	[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (#8577 )	2024-09-19 17:37:57 +00:00
Nick Hill	76515f303b	[Frontend] Use MQLLMEngine for embeddings models too (#8584 )	2024-09-19 12:51:06 -04:00
Roger Wang	02c9afa2d0	Revert "[Misc][Bugfix] Disable guided decoding for mistral tokenizer" (#8593 )	2024-09-19 04:14:28 +00:00
sroy745	3118f63385	[Bugfix] [Encoder-Decoder] Bugfix for encoder specific metadata construction during decode of encoder-decoder models. (#8545 )	2024-09-19 02:24:15 +00:00
Joe Runde	0d47bf3bf4	[Bugfix] add `dead_error` property to engine client (#8574 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-18 22:10:01 +00:00
Nick Hill	d9cd78eb71	[BugFix] Nonzero exit code if MQLLMEngine startup fails (#8572 )	2024-09-18 20:17:55 +00:00
Tyler Michael Smith	db9120cded	[Kernel] Change interface to Mamba selective_state_update for continuous batching (#8039 )	2024-09-18 20:05:06 +00:00
Gregory Shtrasberg	b3195bc9e4	[AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (#8380 ) Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 10:41:08 -07:00
Geun, Lim	e18749ff09	[Model] Support Solar Model (#8386 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 11:04:00 -06:00
Russell Bryant	d65798f78c	[Core] zmq: bind only to 127.0.0.1 for local-only usage (#8543 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-18 16:10:27 +00:00
afeldman-nm	a8c1d161a7	[Core] Prompt logprobs support in Multi-step (#8199 )	2024-09-18 08:38:43 -07:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
Aaron Pham	9d104b5beb	[CI/Build] Update Ruff version (#8469 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-18 11:00:56 +00:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Jiaxin Shan	e351572900	[Misc] Add argument to disable FastAPI docs (#8554 )	2024-09-18 09:51:59 +00:00
Tyler Michael Smith	8110e44529	[Kernel] Change interface to Mamba causal_conv1d_update for continuous batching (#8012 )	2024-09-17 23:44:27 +00:00
Joe Runde	98f9713399	[Bugfix] Fix TP > 1 for new granite (#8544 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-17 23:17:08 +00:00
Nick Hill	56c3de018c	[Misc] Don't dump contents of kvcache tensors on errors (#8527 )	2024-09-17 12:24:29 -07:00
Patrick von Platen	a54ed80249	[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-17 17:50:37 +00:00
chenqianfzh	9855b99502	[Feature][kernel] tensor parallelism with bitsandbytes quantization (#8434 )	2024-09-17 08:09:12 -07:00
sroy745	1009e93c5d	[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631 )	2024-09-17 07:35:01 -07:00
Rui Qiao	cbdb252259	[Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change (#8509 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-09-17 00:06:26 -07:00
youkaichao	99aa4eddaf	[torch.compile] register allreduce operations as custom ops (#8526 )	2024-09-16 22:57:57 -07:00
Roger Wang	ee2bceaaa6	[Misc][Bugfix] Disable guided decoding for mistral tokenizer (#8521 )	2024-09-16 22:22:45 -07:00
Alex Brooks	1c1bb388e0	[Frontend] Improve Nullable kv Arg Parsing (#8525 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-09-17 04:17:32 +00:00
Simon Mo	546034b466	[refactor] remove triton based sampler (#8524 )	2024-09-16 20:04:48 -07:00
Kevin Lin	47f5e03b5b	[Bugfix] Bind api server port before starting engine (#8491 )	2024-09-16 13:56:28 -07:00
Luka Govedič	5d73ae49d6	[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270 )	2024-09-16 11:52:40 -07:00
Nick Hill	acd5511b6d	[BugFix] Fix clean shutdown issues (#8492 )	2024-09-16 09:33:46 -07:00
lewtun	837c1968f9	[Frontend] Expose revision arg in OpenAI server (#8501 )	2024-09-16 15:55:26 +00:00
ElizaWszola	a091e2da3e	[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032 ) Co-authored-by: Dipika <dipikasikka1@gmail.com>	2024-09-16 09:47:19 -06:00
Isotr0py	fc990f9795	[Bugfix][Kernel] Add `IQ1_M` quantization implementation to GGUF kernel (#8357 )	2024-09-15 16:51:44 -06:00
Chris	3724d5f6b5	[Bugfix][Model] Fix Python 3.8 compatibility in Pixtral model by updating type annotations (#8490 )	2024-09-15 04:20:05 +00:00
Woosuk Kwon	50e9ec41fc	[TPU] Implement multi-step scheduling (#8489 )	2024-09-14 16:58:31 -07:00
youkaichao	47790f3e32	[torch.compile] add a flag to disable custom op (#8488 )	2024-09-14 13:07:16 -07:00
youkaichao	a36e070dad	[torch.compile] fix functionalization (#8480 )	2024-09-14 09:46:04 -07:00
ywfang	8a0cf1ddc3	[Model] support minicpm3 (#8297 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-14 14:50:26 +00:00
Charlie Fu	1ef0d2efd0	[Kernel][Hardware][Amd]Custom paged attention kernel for rocm (#8310 )	2024-09-13 17:01:11 -07:00
Kunshang Ji	851725202a	[Hardware][intel GPU] bump up ipex version to 2.3 (#8365 ) Co-authored-by: Yan Ma <yan.ma@intel.com>	2024-09-13 16:54:34 -07:00
Simon Mo	9ba0817ff1	bump version to v0.6.1.post2 (#8473 )	2024-09-13 11:35:00 -07:00
Nick Hill	18e9e1f7b3	[HotFix] Fix final output truncation with stop string + streaming (#8468 )	2024-09-13 11:31:12 -07:00
youkaichao	0a4806f0a9	[plugin][torch.compile] allow to add custom compile backend (#8445 )	2024-09-13 09:32:42 -07:00
Jee Jee Li	06311e2956	[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 (#8442 )	2024-09-13 07:58:28 +00:00
Simon Mo	acda0b35d0	bump version to v0.6.1.post1 (#8440 )	2024-09-12 21:39:49 -07:00

1 2 3 4 5 ...

1760 Commits