xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-17 08:47:03 +08:00

Author	SHA1	Message	Date
Mor Zusman	f13a07b1f8	[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533 )	2024-09-29 17:35:58 -04:00
Jee Jee Li	3d49776bbb	[Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199 )	2024-09-29 06:59:45 +00:00
Zilin Zhu	bc2ef1f77c	[Model] Support Qwen2.5-Math-RM-72B (#8896 )	2024-09-28 21:19:39 -07:00
ElizaWszola	d081da0064	[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741 ) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-09-28 18:19:40 -07:00
Cyrus Leung	e1a3f5e831	[CI/Build] Update models tests & examples (#8874 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-28 09:54:35 -07:00
Lucas Wilkinson	c5d55356f9	[Bugfix] fix for deepseek w4a16 (#8906 ) Co-authored-by: mgoin <michael@neuralmagic.com>	2024-09-27 13:12:34 -06:00
Luka Govedič	172d1cd276	[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271 )	2024-09-27 14:25:10 -04:00
Isotr0py	6d792d2f31	[Bugfix][VLM] Fix Fuyu batching inference with `max_num_seqs>1` (#8892 )	2024-09-27 01:15:58 -07:00
Roger Wang	4bb98f2190	[Misc] Update config loading for Qwen2-VL and remove Granite (#8837 )	2024-09-26 07:45:30 -07:00
Michael Goin	7193774b1f	[Misc] Support quantization of MllamaForCausalLM (#8822 )	2024-09-25 14:46:22 -07:00
Chen Zhang	770ec6024f	[Model] Add support for the multi-modal Llama 3.2 model (#8811 ) Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-25 13:29:32 -07:00
Michael Goin	873edda6cf	[Misc] Support FP8 MoE for compressed-tensors (#8588 )	2024-09-25 09:43:36 -07:00
DefTruth	0c4d2ad5e6	[VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614 )	2024-09-25 09:35:53 -07:00
bnellnm	300da09177	[Kernel] Fullgraph and opcheck tests (#8479 )	2024-09-25 08:35:52 -06:00
sohamparikh	3e073e66f1	[Bugfix] load fc bias from config for eagle (#8790 )	2024-09-24 23:16:30 -07:00
Isotr0py	c23953675f	[Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (#8770 )	2024-09-24 23:16:11 -07:00
zifeitong	e3dd0692fa	[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (#8250 )	2024-09-25 05:53:43 +00:00
Travis Johnson	01b6f9e1f0	[Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-09-24 17:29:56 -07:00
Jee Jee Li	13f9f7a3d0	[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768 )	2024-09-24 17:08:55 -07:00
Lucas Wilkinson	72fc97a0f1	[Bugfix] Fix torch dynamo fixes caused by `replace_parameters` (#8748 )	2024-09-24 14:33:21 -04:00
Alex Brooks	8ff7ced996	[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-24 07:36:46 +00:00
Peter Salas	3f06bae907	[Core][Model] Support loading weights by ID within models (#7931 )	2024-09-24 07:14:15 +00:00
jiqing-feng	5f7bb58427	Fix typical acceptance sampler with correct recovered token ids (#8562 )	2024-09-23 12:32:27 -07:00
Lucas Wilkinson	86e9c8df29	[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 ) Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-09-23 13:46:26 -04:00
Jani Monoses	f2bd246c17	[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707 )	2024-09-23 14:43:09 +00:00
Yanyi Liu	a79e522984	[Model] Support pp for qwen2-vl (#8696 )	2024-09-23 13:46:59 +00:00
Lily Liu	c6bd70d772	[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701 )	2024-09-22 12:34:14 -07:00
litianjian	5b59532760	[Model][VLM] Add LLaVA-Onevision model support (#8486 ) Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-22 10:51:44 -07:00
Cyrus Leung	06ed2815e2	[Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407 )	2024-09-22 12:24:21 +00:00
Isotr0py	13d88d4137	[Bugfix] Refactor composite weight loading logic (#8656 )	2024-09-22 04:33:27 +00:00
Divakar Verma	9dc7c6c7f3	[dbrx] refactor dbrx experts to extend FusedMoe class (#8518 )	2024-09-21 15:09:39 -06:00
rasmith	ec4aaad812	[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (#8646 )	2024-09-21 09:20:54 +00:00
Cyrus Leung	5e85f4f82a	[VLM] Use `SequenceData.from_token_counts` to create dummy data (#8687 )	2024-09-20 23:28:56 -07:00
zyddnys	0f961b3ce9	[Bugfix] Fix incorrect llava next feature size calculation (#8496 )	2024-09-20 22:48:32 +00:00
Niklas Muennighoff	3b63de9353	[Model] Add OLMoE (#7922 )	2024-09-20 09:31:41 -07:00
Amit Garg	18ae428a0d	[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571 )	2024-09-20 08:54:02 +08:00
盏一	e42c634acb	[Core] simplify logits resort in _apply_top_k_top_p (#8619 )	2024-09-19 18:28:25 +00:00
Roger Wang	02c9afa2d0	Revert "[Misc][Bugfix] Disable guided decoding for mistral tokenizer" (#8593 )	2024-09-19 04:14:28 +00:00
Tyler Michael Smith	db9120cded	[Kernel] Change interface to Mamba selective_state_update for continuous batching (#8039 )	2024-09-18 20:05:06 +00:00
Gregory Shtrasberg	b3195bc9e4	[AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (#8380 ) Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 10:41:08 -07:00
Geun, Lim	e18749ff09	[Model] Support Solar Model (#8386 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 11:04:00 -06:00
Aaron Pham	9d104b5beb	[CI/Build] Update Ruff version (#8469 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-18 11:00:56 +00:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Tyler Michael Smith	8110e44529	[Kernel] Change interface to Mamba causal_conv1d_update for continuous batching (#8012 )	2024-09-17 23:44:27 +00:00
Joe Runde	98f9713399	[Bugfix] Fix TP > 1 for new granite (#8544 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-17 23:17:08 +00:00
chenqianfzh	9855b99502	[Feature][kernel] tensor parallelism with bitsandbytes quantization (#8434 )	2024-09-17 08:09:12 -07:00
sroy745	1009e93c5d	[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631 )	2024-09-17 07:35:01 -07:00
Roger Wang	ee2bceaaa6	[Misc][Bugfix] Disable guided decoding for mistral tokenizer (#8521 )	2024-09-16 22:22:45 -07:00
Simon Mo	546034b466	[refactor] remove triton based sampler (#8524 )	2024-09-16 20:04:48 -07:00
Luka Govedič	5d73ae49d6	[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270 )	2024-09-16 11:52:40 -07:00

1 2 3 4 5 ...

782 Commits