xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-06 18:17:10 +08:00

Author	SHA1	Message	Date
Nick Hill	5db6b2c961	[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs (#13869 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-04 15:06:47 +00:00
Michael Goin	6247bae6c6	[Bugfix] Restrict MacOS CPU detection (#14210 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-04 22:25:27 +08:00
youkaichao	3610fb4930	[doc] add "Failed to infer device type" to faq (#14200 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-04 20:47:06 +08:00
youkaichao	71c4b40562	[sleep mode] error out with expandable_segments (#14189 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-04 18:54:19 +08:00
youkaichao	ac65bc92df	[platform] add debug logging during inferring the device type (#14195 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-04 18:39:16 +08:00
Michael Goin	f78c0be80a	Fix benchmark_moe.py tuning for CUDA devices (#14164 )	2025-03-03 21:11:03 -08:00
Zhanwen Chen	66233af7b6	Use math.prod instead of np.prod for trivial ops (#14142 )	2025-03-03 21:09:22 -08:00
Rui Qiao	bf13d40972	[core] Pass all driver env vars to ray workers unless excluded (#14099 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-03-04 11:44:17 +08:00
Cody Yu	989f4f430c	[Misc] Remove lru_cache in NvmlCudaPlatform (#14156 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-04 11:09:34 +08:00
Divakar Verma	bb5b640359	[core] moe fp8 block quant tuning support (#14068 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-03-04 01:30:23 +00:00
Travis Johnson	c060b71408	[Model] Add support for GraniteMoeShared models (#13313 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-04 08:04:52 +08:00
iefgnoix	79e4937c65	[v1] Add comments to the new ragged paged attention Pallas kernel (#14155 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-03-03 23:00:55 +00:00
Qubitium-ModelCloud	cd1d3c3df8	[Docs] Add GPTQModel (#14056 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-03-03 21:59:09 +00:00
Michael Goin	19d98e0c7d	[Kernel] Optimize moe intermediate_cache usage (#13625 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-03 16:29:53 -05:00
Michael Goin	2b04c209ee	[Bugfix] Allow shared_experts skip quantization for DeepSeekV2/V3 (#14100 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-03 14:20:24 -07:00
Mark McLoughlin	ae122b1cbd	[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 19:04:45 +00:00
Nick Hill	872db2be0e	[V1] Simplify stats logging (#14082 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-03 10:34:14 -08:00
Mark McLoughlin	2dfdfed8a0	[V0][Metrics] Deprecate some KV/prefix cache metrics (#14136 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 18:25:46 +00:00
Mark McLoughlin	c41d27156b	[V0][Metrics] Remove unimplemented `vllm:tokens_total` (#14134 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 17:50:22 +00:00
Harry Mellor	91373a0d15	Fix `head_dim` not existing in all model configs (Transformers backend) (#14141 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-03 17:48:11 +00:00
TJian	848a6438ae	[ROCm] Faster Custom Paged Attention kernels (#12348 )	2025-03-03 09:24:45 -08:00
Harry Mellor	98175b2816	Improve the docs for `TransformersModel` (#14147 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-03 17:03:05 +00:00
Mark McLoughlin	4167252eaf	[V1] Refactor parallel sampling support (#13774 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 08:15:27 -08:00
Cody Yu	f35f8e2242	[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 (#13921 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-03 16:43:14 +08:00
Mengqing Cao	b87c21fc89	[Misc][Platform] Move use allgather to platform (#14010 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-03-03 15:40:04 +08:00
wang.yuqi	e584b85afd	[Misc] duplicate code in deepseek_v2 (#14106 )	2025-03-03 14:10:11 +08:00
Sheng Yao	09e56f9262	[Bugfix] Explicitly include "omp.h" for MacOS to avoid installation failure (#14051 )	2025-03-02 17:35:01 -08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Ce Gao	bf33700ecd	[v0][structured output] Support reasoning output (#12955 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-02 14:49:42 -05:00
qux-bbb	bc6ccb9878	[Doc] Source building add clone step (#14086 ) Signed-off-by: qux-bbb <1147635419@qq.com>	2025-03-02 10:59:50 +00:00
Jun Duan	82fbeae92b	[Misc] Accurately capture the time of loading weights (#14063 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-03-01 17:20:30 -08:00
Jee Jee Li	cc5e8f6db8	[Model] Add LoRA support for TransformersModel (#13770 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-02 09:17:34 +08:00
Chen Zhang	d54990da47	[v1] Add `__repr__` to KVCacheBlock to avoid recursive print (#14081 )	2025-03-01 20:46:02 +00:00
Chen Zhang	b9f1d4294e	[v1][Bugfix] Only cache blocks that are not in the prefix cache (#14073 )	2025-03-01 08:25:54 +00:00
Sage Moore	b28246f6ff	[ROCm][V1][Bugfix] Add get_builder_cls method to the ROCmAttentionBackend class (#14065 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-03-01 07:18:32 +00:00
Woosuk Kwon	3b5567a209	[V1][Minor] Do not print attn backend twice (#13985 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-01 07:09:14 +00:00
Isotr0py	fdcc405346	[Doc] Consolidate `whisper` and `florence2` examples (#14050 )	2025-02-28 22:49:15 -08:00
Kuntai Du	8994dabc22	[Documentation] Add more deployment guide for Kubernetes deployment (#13841 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-03-01 06:44:24 +00:00
Li, Jiang	02296f420d	[Bugfix][V1][Minor] Fix shutting_down flag checking in V1 MultiprocExecutor (#14053 )	2025-02-28 22:31:01 -08:00
YajieWang	6a92ff93e1	[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931 )	2025-02-28 22:30:59 -08:00
Jee Jee Li	6a84164add	[Bugfix] Add file lock for ModelScope download (#14060 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-01 06:10:28 +00:00
Brayden Zhong	f64ffa8c25	[Docs] Add `pipeline_parallel_size` to optimization docs (#14059 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-01 05:43:54 +00:00
Luka Govedič	bd56c983d6	[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-02-28 16:20:11 -07:00
Rui Qiao	084bbac8cc	[core] Bump ray to 2.43 (#13994 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-02-28 21:47:44 +00:00
Chen Zhang	28943d36ce	[v1] Move block pool operations to a separate class (#13973 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-28 20:53:31 +00:00
Andrey Talman	b526ca6726	Add RELEASE.md (#13926 ) Signed-off-by: atalman <atalman@fb.com>	2025-02-28 12:25:50 -08:00
Chen Zhang	e7bd944e08	[v1] Cleanup the BlockTable in InputBatch (#13977 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-02-28 19:03:16 +00:00
iefgnoix	c3b6559a10	[V1][TPU] Integrate the new ragged paged attention kernel with vLLM v1 on TPU (#13379 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-02-28 11:01:36 -07:00
Harry Mellor	4be4b26cb7	Fix entrypoint tests for embedding models (#14052 )	2025-02-28 08:56:44 -08:00
Brayden Zhong	2aed2c9fa7	[Doc] Fix ROCm documentation (#14041 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-02-28 16:42:07 +00:00

1 2 3 4 5 ...

4912 Commits