xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 00:27:21 +08:00

Author	SHA1	Message	Date
Szymon Ożóg	aa375dca9f	[Bugfix] Missing quant_config in deepseek embedding layer (#12836 )	2025-02-06 21:35:09 -08:00
ZSL98	433c4a4923	Make vllm compatible with verl (#12824 ) Co-authored-by: zhangshulai <zhangshulai@bytedance.com>	2025-02-07 11:54:20 +08:00
Lucas Wilkinson	ef533d25fb	[Bugfix] FA2 illegal memory access (#12848 )	2025-02-06 19:54:07 -08:00
Kevin H. Luu	b260782357	[misc] Revert # 12833 (#12857 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-06 16:29:12 -08:00
Lu Fang	741429a4cd	[MISC] Check space in the file names in the pre commit checks (#12804 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-06 15:36:21 -08:00
Yu Chin Fabian Lim	aff404571b	Add Bamba Model (#10909 ) Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-06 15:22:42 -08:00
Varun Sundar Rabindranath	467a96a541	[V1] LoRA Support (#10957 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-06 09:32:51 -08:00
Isotr0py	8108ac841d	[Bugfix] Fix unsupported FA version check for Turing GPU (#12828 )	2025-02-06 09:18:22 -08:00
Jitse Klomp	afe74f7a96	[Doc] double quote cmake package in build.inc.md (#12840 )	2025-02-06 09:17:55 -08:00
youkaichao	09b95e36ab	[torch.compile] PyTorch 2.6 and nightly compatibility (#12393 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-07 01:09:07 +08:00
Isotr0py	85ac82d228	[Kernel] Make rotary_embedding ops more flexible with input shape (#12777 )	2025-02-06 08:46:13 -08:00
Cyrus Leung	1e57b1ee63	[Misc] Remove unnecessary decode call (#12833 )	2025-02-06 08:45:44 -08:00
Kevin H. Luu	e152f29502	[misc] Reduce number of config file requests to HuggingFace (#12797 ) Signed-off-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal> Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-06 14:59:18 +00:00
Lucas Wilkinson	c786e757fa	[Attention] Use FA3 for MLA on Hopper (#12807 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-02-06 11:43:12 +00:00
Simon Mo	cefd56ee35	[Docs] Add Google Cloud Slides (#12814 )	2025-02-06 01:02:38 -08:00
Dipika Sikka	7ca9934fe7	[Misc] Update w2 scale loading for GPTQMarlinMoE (#12757 )	2025-02-06 01:02:14 -08:00
youkaichao	0408efc6d0	[Misc] Improve error message for incorrect pynvml (#12809 ) Signed-off-by: youkaichao <youkaichao@gmail.com> v0.7.2	2025-02-06 15:23:50 +08:00
Michael Goin	449d1bce02	[Misc] Remove duplicated DeepSeek V2/V3 model definition (#12793 )	2025-02-05 23:16:20 -08:00
Harry Mellor	1a6fcad4c9	Improve `TransformersModel` UX (#12785 )	2025-02-05 22:24:57 -08:00
Lu Fang	56534cd577	[Bugfix] Fix the test_ultravox.py's license (#12806 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-06 13:25:54 +08:00
Sumit Vij	d88506dda4	[Model] LoRA Support for Ultravox model (#11253 )	2025-02-05 19:54:13 -08:00
Lu Fang	9cdea30b4f	[Misc][Easy] Remove the space from the file name	2025-02-05 19:23:35 -08:00
Lucas Wilkinson	76abd0c881	[Bugfix] Better FP8 supported defaults	2025-02-05 19:22:19 -08:00
Gregory Shtrasberg	5b19b93082	[ROCm][Kernel] Using the correct warp_size value	2025-02-05 19:15:08 -08:00
Cyrus Leung	75404d041b	[VLM] Update compatibility with transformers 4.49	2025-02-05 19:09:45 -08:00
Roger Wang	bf3b79efb8	[VLM] Qwen2.5-VL	2025-02-05 13:31:38 -08:00
Russell Bryant	9a5b1554b4	[Docs] Drop duplicate [source] links	2025-02-05 13:30:50 -08:00
Cyrus Leung	a4ce74c14a	[VLM] Use shared field to pass token ids to model	2025-02-05 13:30:46 -08:00
Rahul Tuli	3b2005e1db	Add: Support for Sparse24Bitmask Compressed Models	2025-02-05 13:30:43 -08:00
Sanju C Sudhakaran	af8486de49	[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU)	2025-02-05 13:29:45 -08:00
Chen Zhang	4c3aac51e1	Merging PR #12536 Merged via CLI script	2025-02-05 13:24:26 -08:00
youkaichao	bc1bdecebf	[core][distributed] exact ray placement control (#12732 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-06 02:03:19 +08:00
Akash kaothalkar	022bcc701a	[Bugfix] Fix 'ModuleNotFoundError: No module named 'intel_extension_for_pytorch'' for --tensor-parallel-size more than 1 (#12546 )	2025-02-04 23:11:02 -08:00
Michael Goin	c53dc466b1	[Doc] Remove performance warning for auto_awq.md (#12743 )	2025-02-04 22:43:11 -08:00
Nick Hill	3d09e592a8	[V1][Misc] Shorten `FinishReason` enum and use constant strings (#12760 )	2025-02-04 22:43:02 -08:00
Harry Mellor	fcf2e3d7fc	[Bugfix] Fix OpenVINO model runner (#12750 )	2025-02-04 22:42:46 -08:00
Michael Goin	58b218d7ae	[Doc] Update PR Reminder with link to Developer Slack (#12748 )	2025-02-04 22:42:09 -08:00
Kyle Sayers	7ff7a638b6	[Model][Quant] Fix GLM, Fix fused module mappings for quantization (#12634 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-02-05 05:32:06 +00:00
Dipika Sikka	686006a220	[Misc] Bump the compressed-tensors version (#12736 )	2025-02-04 20:44:48 -08:00
Isotr0py	98fd089fc9	[VLM] Add MLA with pure RoPE support for deepseek-vl2 models (#12729 )	2025-02-04 20:44:26 -08:00
Harry Mellor	249824c3bf	Refactor `Linear` handling in `TransformersModel` (#12727 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-05 04:31:12 +00:00
Aleksandr Malyshev	64862d106e	[ROCM][AMD][TRITON] Halving warps number for fw_prefill to reduce spilling (#12713 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-02-05 03:58:22 +00:00
Aviv Keshet	b3a0d01e45	[Core] add and implement `VLLM_LOGITS_PROCESSOR_THREADS` (#12368 ) Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>	2025-02-04 18:46:26 -08:00
Lucas Wilkinson	75e94309e8	[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676 ) Signed-off-by: simon-mo <xmo@berkeley.edu> Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-02-04 18:22:24 -08:00
Mark McLoughlin	233df6f5c4	[V1][Metrics] Add request_success_total counter, labelled with finish reason (#12579 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-02-04 19:46:54 -05:00
Cyrus Leung	18016a5e62	[Bugfix] Fix CI failures for InternVL and Mantis models (#12728 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-04 23:54:23 +08:00
Sophie du Couédic	649550f27e	[Build] update requirements of no-device for plugin usage (#12630 ) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>	2025-02-04 21:19:12 +08:00
Kero Liang	62467a834a	Avoid unnecessary multi-modal input data copy when len(batch) == 1 (#12722 ) Signed-off-by: imkero <kerorek@outlook.com>	2025-02-04 21:03:19 +08:00
Michael Greenbaum	6469038b14	[Bugfix] Fix loading of fine-tuned models based on Phi-3-Small (#12689 ) Signed-off-by: Michael Greenbaum <mgreenbaum@microsoft.com> Co-authored-by: Michael Greenbaum <mgreenbaum@microsoft.com>	2025-02-04 20:58:48 +08:00
Isotr0py	815079de8e	[VLM] merged multimodal processor and V1 support for idefics3 (#12660 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-02-04 20:00:51 +08:00

1 2 3 4 5 ...

4471 Commits