xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-17 07:35:45 +08:00

Author	SHA1	Message	Date
Isotr0py	dd2a6a82e3	[Bugfix] Fix internlm2 tensor parallel inference (#8055 )	2024-09-02 23:48:56 +08:00
Lily Liu	e6a26ed037	[SpecDecode][Kernel] Flashinfer Rejection Sampling (#7244 )	2024-09-01 21:23:29 -07:00
Shawn Tan	f8d60145b4	[Model] Add Granite model (#7436 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-09-01 18:37:18 -07:00
Roger Wang	5b86b19954	[Misc] Optional installation of audio related packages (#8063 )	2024-09-01 14:46:57 -07:00
Cyrus Leung	d05f0a9db2	[Bugfix] Fix import error in Phi-3.5-MoE (#8052 )	2024-08-30 22:26:55 -07:00
Wenxiang	1248e8506a	[Model] Adding support for MSFT Phi-3.5-MoE (#7729 ) Co-authored-by: Your Name <you@example.com> Co-authored-by: Zeqi Lin <zelin@microsoft.com> Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>	2024-08-30 13:42:57 -06:00
Jungho Christopher Cho	f97be32d1d	[VLM][Model] TP support for ViTs (#7186 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-08-30 08:19:27 -07:00
Cyrus Leung	afd39a4511	[Bugfix] Fix import error in Exaone model (#8034 )	2024-08-30 08:03:28 -07:00
Yohan Na	dc13e99348	[MODEL] add Exaone model support (#7819 )	2024-08-29 23:34:20 -07:00
afeldman-nm	428dd1445e	[Core] Logprobs support in Multi-step (#7652 )	2024-08-29 19:19:08 -07:00
chenqianfzh	4664ceaad6	support bitsandbytes 8-bit and FP4 quantized models (#7445 )	2024-08-29 19:09:08 -04:00
Harsha vardhan manoj Bikki	257afc37c5	[Neuron] Adding support for context-lenght, token-gen buckets. (#7885 ) Co-authored-by: Harsha Bikki <harbikh@amazon.com>	2024-08-29 13:58:14 -07:00
Dipika Sikka	86a677de42	[misc] update tpu int8 to use new vLLM Parameters (#7973 )	2024-08-29 16:46:55 -04:00
Isotr0py	d78789ac16	[Bugfix] Fix incorrect vocal embedding shards for GGUF model in tensor parallelism (#7954 )	2024-08-29 15:54:49 -04:00
Peter Salas	74d5543ec5	[VLM][Core] Fix exceptions on ragged NestedTensors (#7974 )	2024-08-29 03:24:31 +00:00
Mor Zusman	fdd9daafa3	[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM (#7651 )	2024-08-28 15:06:52 -07:00
rasmith	e5697d161c	[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ (#7386 )	2024-08-28 15:37:47 -04:00
Cyrus Leung	ef9baee3c5	[Bugfix][VLM] Fix incompatibility between #7902 and #7230 (#7948 )	2024-08-28 08:11:18 -07:00
Peter Salas	fab5f53e2d	[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902 )	2024-08-28 01:53:56 +00:00
zifeitong	5340a2dccf	[Model] Add multi-image input support for LLaVA-Next offline inference (#7230 )	2024-08-28 07:09:02 +08:00
Dipika Sikka	fc911880cc	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-27 15:07:09 -07:00
Isotr0py	b09c755be8	[Bugfix] Fix phi3v incorrect image_idx when using async engine (#7916 )	2024-08-27 17:36:09 +00:00
Dipika Sikka	015e6cc252	[Misc] Update compressed tensors lifecycle to remove `prefix` from `create_weights` (#7825 )	2024-08-26 18:09:34 -06:00
Dipika Sikka	dd9857f5fa	[Misc] Update `gptq_marlin_24` to use vLLMParameters (#7762 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-26 17:44:54 -04:00
Dipika Sikka	665304092d	[Misc] Update `qqq` to use vLLMParameters (#7805 )	2024-08-26 13:16:15 -06:00
Isotr0py	8aaf3d5347	[Model][VLM] Support multi-images inputs for Phi-3-vision models (#7783 )	2024-08-25 11:51:20 +00:00
zifeitong	80162c44b1	[Bugfix] Fix Phi-3v crash when input images are of certain sizes (#7840 )	2024-08-24 18:16:24 -07:00
youkaichao	7d9ffa2ae1	[misc][core] lazy import outlines (#7831 )	2024-08-24 00:51:38 -07:00
Tyler Rockwood	d81abefd2e	[Frontend] add json_schema support from OpenAI protocol (#7654 )	2024-08-23 23:07:24 -07:00
Dipika Sikka	f1df5dbfd6	[Misc] Update `marlin` to use vLLMParameters (#7803 )	2024-08-23 14:30:52 -04:00
Dipika Sikka	955b5191c9	[Misc] update fp8 to use `vLLMParameter` (#7437 )	2024-08-22 08:36:18 -04:00
Flex Wang	4f419c00a6	Fix ShardedStateLoader for vllm fp8 quantization (#7708 )	2024-08-22 08:25:04 -04:00
Abhinav Goyal	a3fce56b88	[Speculative Decoding] EAGLE Implementation with Top-1 proposer (#6830 )	2024-08-22 02:42:24 -07:00
Woosuk Kwon	b3856bef7d	[Misc] Use torch.compile for GemmaRMSNorm (#7642 )	2024-08-22 01:14:13 -07:00
Michael Goin	aae74ef95c	Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527 )" (#7764 )	2024-08-22 03:42:14 +00:00
zifeitong	df1a21131d	[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710 )	2024-08-22 09:36:24 +08:00
Dipika Sikka	8678a69ab5	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-21 16:17:10 -07:00
Peter Salas	1ca0d4f86b	[Model] Add UltravoxModel and UltravoxConfig (#7615 )	2024-08-21 22:49:39 +00:00
Isotr0py	12e1c65bc9	[Model] Add AWQ quantization support for InternVL2 model (#7187 )	2024-08-20 23:18:57 -07:00
Lucas Wilkinson	5288c06aa0	[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )	2024-08-20 07:09:33 -06:00
jianyizh	e6d811dd13	[XPU] fallback to native implementation for xpu custom op (#7670 )	2024-08-20 00:26:09 -07:00
Zijian Hu	f4fc7337bf	[Bugfix] support `tie_word_embeddings` for all models (#5724 )	2024-08-19 20:00:04 -07:00
Isotr0py	7601cb044d	[Core] Support tensor parallelism for GGUF quantization (#7520 )	2024-08-19 17:30:14 -04:00
Woosuk Kwon	df845b2b46	[Misc] Remove Gemma RoPE (#7638 )	2024-08-19 09:29:31 -07:00
Peng Guanwen	f710fb5265	[Core] Use flashinfer sampling kernel when available (#7137 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-19 03:24:03 +00:00
SangBin Cho	ff7ec82c4d	[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )	2024-08-18 17:57:20 -07:00
Woosuk Kwon	200a2ffa6b	[Misc] Refactor Llama3 RoPE initialization (#7637 )	2024-08-18 17:18:12 -07:00
Woosuk Kwon	ab7165f2c7	[TPU] Optimize RoPE forward_native2 (#7636 )	2024-08-18 01:15:10 -07:00
Roger Wang	bbf55c4805	[VLM] Refactor `MultiModalConfig` initialization and profiling (#7530 )	2024-08-17 13:30:55 -07:00
Jee Jee Li	1ef13cf92f	[Misc]Fix BitAndBytes exception messages (#7626 )	2024-08-17 12:02:14 -07:00

... 2 3 4 5 6 ...

837 Commits