xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-04 10:09:09 +08:00

Author	SHA1	Message	Date
Russell Bryant	cb84e45ac7	[Core] Upgrade to xgrammar 0.1.18, add cache size limit (#16283 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-08 19:13:22 -07:00
Isotr0py	c2a9671510	[Misc] Improve model redirect to accept json dictionary (#16119 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-06 05:51:45 -07:00
bnellnm	e59ca942f5	Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-04-01 12:07:43 -04:00
Rui Qiao	8dd41d6bcc	[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE (#15831 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-01 06:07:53 -07:00
TJian	4965ec42d2	[FEAT] [ROCm] Add AITER int8 scaled gemm kernel (#15433 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-03-29 03:33:56 -07:00
Nick Hill	15dac210f0	[V1] AsyncLLM data parallel (#13923 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-27 16:14:41 -07:00
Yuan Tang	66aa4c0bf4	[Feature] Add middleware to log API Server responses (#15593 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-27 17:49:38 +00:00
wang.yuqi	3f532cb6a6	[Misc] Use model_redirect to redirect the model name to a local folder. (#14116 )	2025-03-27 02:21:23 -07:00
Gregory Shtrasberg	ecff8309a3	[ROCm] Env variable to trigger custom PA (#15557 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-03-26 22:46:12 -07:00
Alexander Matveev	b2e85e26f4	[V1] TPU - Revert to exponential padding by default (#15565 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-26 21:35:05 +00:00
vllmellm	5ebf66748b	[FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-03-26 16:30:30 +08:00
Chenyaaang	ac3cd6e83c	[core] add bucket padding to tpu_model_runner (#14995 ) Signed-off-by: Chenyaaang <llccyy1212@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-25 17:27:22 -04:00
Gregory Shtrasberg	f533b5837f	[ROCm][Kernel] MoE weights padding (#14454 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: charlifu <charlifu@amd.com>	2025-03-24 23:45:30 +00:00
Cyrus Leung	6dd55af6c9	[Doc] Update docs on handling OOM (#15357 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-24 14:29:34 -07:00
Russell Bryant	8abe69b499	[Core] Don't force uppercase for VLLM_LOGGING_LEVEL (#15306 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-24 08:27:30 -07:00
DefTruth	6ebaf9ac71	[Bugfix] consider related env vars for torch.compiled cache hash (#14953 ) Signed-off-by: DefTruth <31974251+DefTruth@users.noreply.github.com>	2025-03-23 15:53:09 +00:00
Russell Bryant	b877031d80	Remove openvino support in favor of external plugin (#15339 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-22 14:06:39 -07:00
TJian	ec870fba9a	[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (#14959 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-03-21 22:36:14 -07:00
Siyuan Liu	b15fd2be2a	[Hardware][TPU] Add check for no additional graph compilation during runtime (#14710 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-03-21 03:05:28 +00:00
Hyesoo Yang	47195057e9	[V1][TPU] Speed up top-k on TPU by using torch.topk (#15242 ) Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>	2025-03-20 19:19:40 -07:00
Mickaël Seznec	a597a57595	[Attention] Flash Attention 3 - fp8 (#14570 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai>	2025-03-20 01:14:20 -04:00
Woosuk Kwon	99abb8b650	[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-18 14:31:54 -07:00
Cyrus Leung	3556a41434	[VLM] Limit multimodal input cache by memory (#14805 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-15 02:52:05 -07:00
Lucas Wilkinson	5952d8ab61	[Attention] Get rid of mla cache alignment (#14842 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-15 05:08:25 +00:00
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
Russell Bryant	776dcec8fe	Disable outlines cache by default (#14837 )	2025-03-15 03:57:55 +00:00
Lucas Wilkinson	9532c49836	[Attention] MLA get rid of materialization (#14770 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-13 23:39:02 -07:00
Thien Tran	95d680b862	[Bugfix][IPEX] Add `VLLM_CPU_MOE_PREPACK` to allow disabling MoE prepack when CPU does not support it (#14681 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-13 20:43:18 -07:00
Harry Mellor	3b352a2f92	Correct capitalisation: `VLLM` -> `vLLM` (#14562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-10 16:36:21 +00:00
Jinzhen Lin	d0feea31c7	[Kernel] optimize performance of gptq marlin kernel when n is small (#14138 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-03-07 11:53:38 -05:00
Tyler Michael Smith	cc2f9b32c8	[Distributed] Add enable_expert_parallel arg (#14305 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 18:54:45 +00:00
Serena	1b7624bf5c	[misc] Add FlashMLA as a new option of VLLM_ATTENTION_BACKEND env (#14267 )	2025-03-05 21:28:50 +00:00
Cody Yu	f35f8e2242	[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 (#13921 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-03 16:43:14 +08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Rui Qiao	c9944acbf9	[misc] Rename Ray ADAG to Compiled Graph (#13928 )	2025-02-26 20:03:28 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Kevin H. Luu	2c5e637b57	[ci] Use env var to control whether to use S3 bucket in CI (#13634 )	2025-02-22 19:19:45 -08:00
Helena Kloosterman	382f66fb08	[Bugfix] Fix boolean conversion for OpenVINO env variable (#13615 )	2025-02-22 08:04:12 -08:00
Gregory Shtrasberg	c904fdddf6	[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm (#13231 )	2025-02-22 05:54:38 -08:00
youkaichao	3e472d882a	[core] set up data parallel communication (#13591 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 19:28:59 +08:00
Yu-Zhou	d0a7a2769d	[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch (#12139 ) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-18 19:40:19 -08:00
Roger Wang	dd5ede4440	[V1] Consolidate MM cache size to vllm.envs (#13239 )	2025-02-13 20:19:03 -08:00
Lu Fang	042c3419fa	Introduce VLLM_CUDART_SO_PATH to allow users specify the .so path (#12998 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-12 09:06:13 -08:00
youkaichao	bc1bdecebf	[core][distributed] exact ray placement control (#12732 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-06 02:03:19 +08:00
Aviv Keshet	b3a0d01e45	[Core] add and implement `VLLM_LOGITS_PROCESSOR_THREADS` (#12368 ) Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>	2025-02-04 18:46:26 -08:00
Lucas Wilkinson	75e94309e8	[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676 ) Signed-off-by: simon-mo <xmo@berkeley.edu> Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-02-04 18:22:24 -08:00
Yang Chen	95460fc513	[Kernel] port sgl moe_align_block_size kernels (#12574 ) sgl_moe_align_block_size is based on: `ded9fcd09a` moe_align_block_size is based on: `ba5112ff69` Signed-off-by: Yang Chen <yangche@fb.com>	2025-02-03 13:09:50 +08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Lucas Wilkinson	baeded2569	[Attention] Deepseek v3 MLA support with FP8 compute (#12601 ) This PR implements the Deepseek V3 support by performing matrix absorption the fp8 weights --------- Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>	2025-01-31 21:52:51 -08:00
Lucas Wilkinson	cabaf4eff3	[Attention] MLA decode optimizations (#12528 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-01-30 23:49:37 -08:00

1 2 3

130 Commits