xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-24 08:17:07 +08:00

Author	SHA1	Message	Date
Chendi.Xue	a6149aa587	[OOT] Support sync_model_loading for OOT (#25126 ) Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>	2025-09-19 05:41:53 +00:00
liuzhenwei	e599e2c65e	[XPU][P/D] Add XPU support in NixlConnector (#22436 ) Signed-off-by: zhenwei <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 21:03:12 -07:00
Matthew Bonanni	19fe1a0510	[Kernel] Add FP8 support with FlashMLA backend (#22668 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-08-22 02:26:32 +00:00
Chengji Yao	e9d6a3db69	[TPU] make ptxla not imported when using tpu_commons (#23081 ) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com>	2025-08-19 11:46:42 +08:00
fhl2000	74f441f4b5	[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-08-15 10:01:39 -04:00
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Yongye Zhu	007dd90859	[gpt-oss] Enable gpt-oss on ampere (#22714 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-08-12 03:21:44 -07:00
Konrad Zawora	c17231e827	Fix kv_cache_dtype handling for out-of-tree HPU plugin (#21302 ) Signed-off-by: Konrad Zawora <kzawora@habana.ai> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com>	2025-07-21 23:35:14 -07:00
Chengji Yao	3a1d8940ae	[TPU] support fp8 kv cache quantization (#19292 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-20 03:01:00 +00:00
Nick Hill	ffbcc9e757	[BugFix] Fix `VllmConfig()` construction on all platforms (#20695 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-10 07:00:20 +00:00
Kunshang Ji	0b407479ef	[misc]refactor `Platform.set_device` method (#20262 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-07-09 01:39:47 +00:00
Yang Yang	6e2c19ce22	[Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU (#19410 ) Signed-off-by: dbyoung18 <yang5.yang@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-07-07 04:32:32 +00:00
Woosuk Kwon	e202dd2736	[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-07-06 08:48:13 -07:00
Chenyaaang	2d7620c3eb	[TPU] Add TPU specific var VLLM_TPU_MOST_MODEL_LEN (#19919 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-06-25 15:51:02 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
wangxiyuan	721fb9b181	[Platform] Move platform check to right place (#18470 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-22 12:11:28 -07:00
Siyuan Liu	48ac2bed5b	[Hardware][TPU] Optionally import for TPU backend (#18269 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Co-authored-by: Carol Zheng <cazheng@google.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com> Co-authored-by: Hongmin Fan <fanhongmin@google.com>	2025-05-17 15:23:12 +08:00
Harry Mellor	c8ea982d9b	Update deprecated type hinting in `platform`, `plugins`, `triton_utils`, `vllm_flash_attn` (#18129 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-14 05:28:16 -07:00
Harry Mellor	4b2ed7926a	Improve configs - the rest! (#17562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 15:18:44 -07:00
Akshat Tripathi	c20ef40fd0	[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-05-07 16:28:47 -04:00
Jevin Jiang	621ca2c0ab	[TPU] Increase block size and reset block shapes (#16458 )	2025-05-06 13:55:04 -04:00
XiongfeiWei	9765940824	[TPU] Enable gemma3-27b with TP>1 on multi-chips. (#17335 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-05-05 14:19:58 -07:00
Chenyaaang	87baebebd8	[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-02 21:42:44 -07:00
Harry Mellor	b6dd32aa07	Make name of `compressed-tensors` quant method consistent across vLLM (#17255 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-28 16:28:13 +00:00
Chenyaaang	83d933718c	[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-22 18:05:23 -06:00
Chengji Yao	471fe65630	[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-21 15:43:13 -06:00
Joe Runde	e1b004839a	[Hardware] Add processor inputs to platform validation (#16680 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-04-16 09:28:42 -07:00
Nicolò Lucchesi	4d022cbc75	[TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (#16483 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-11 17:06:14 +00:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
Michael Goin	baada0e737	[Bugfix][TPU] Fix TPU validate_request (#16369 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-04-10 12:55:12 +08:00
Joe Runde	cb391d85dc	[Hardware] add platform-specific request validation api (#16291 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-04-09 12:50:01 -07:00
Shanshan Shen	e9ba99f296	[V1][Structured Output] Add `supports_structured_output()` method to Platform (#16148 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-07 11:06:24 +00:00
Joe Runde	5f063a80bd	[bugfix] add supports_v1 platform interface (#15417 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-03-25 15:00:32 -04:00
Richard Liu	a8f12a63fd	Fix env vars for running Ray distributed backend on GKE (#15166 ) Signed-off-by: Richard Liu <ricliu@google.com>	2025-03-20 14:59:33 +00:00
Alexander Matveev	7888e1d0a3	[V1] TPU - Enable prefix caching by default (#14773 )	2025-03-13 20:40:05 -07:00
Siyuan Liu	1bc3b739c4	[V1][TPU] Add assertion on multi-step-scheduler (#14707 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-03-12 21:37:58 -07:00
Mengqing Cao	b87c21fc89	[Misc][Platform] Move use allgather to platform (#14010 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-03-03 15:40:04 +08:00
youkaichao	a0231b7c25	[platform] add base class for communicators (#13208 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-16 22:14:22 +08:00
Alexander Matveev	45f90bcbba	[WIP] TPU V1 Support Refactored (#13049 )	2025-02-14 00:21:53 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Lucas Wilkinson	cabaf4eff3	[Attention] MLA decode optimizations (#12528 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-01-30 23:49:37 -08:00
youkaichao	ad34c0df0f	[core] platform agnostic executor via collective_rpc (#11256 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-15 13:45:21 +08:00
youkaichao	458e63a2c6	[platform] add device_control env var (#12009 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-13 20:59:09 +08:00
youkaichao	89ce62a316	[platform] add ray_device_key (#11948 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-13 16:20:52 +08:00
wangxiyuan	405eb8e396	[platform] Allow platform specify attention backend (#11609 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-01-09 21:46:50 +08:00
Robert Shaw	56fe4c297c	[TPU][Quantization] TPU `W8A8` (#11785 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-08 19:33:29 +00:00
wangxiyuan	e88db68cf5	[Platform] platform agnostic for EngineArgs initialization (#11225 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2024-12-16 22:11:06 -08:00
wangxiyuan	aea2fc38c3	[Platform] Move `async output` check to platform (#10768 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2024-12-09 17:24:46 +00:00
wangxiyuan	661175bc82	[platform] Add verify_quantization in platform. (#10757 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2024-11-29 15:22:21 +00:00
youkaichao	eebad39f26	[torch.compile] support all attention backends (#10558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-22 14:04:42 -08:00

1 2

62 Commits