xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-21 19:45:40 +08:00

Author	SHA1	Message	Date
Cody Yu	989f4f430c	[Misc] Remove lru_cache in NvmlCudaPlatform (#14156 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-04 11:09:34 +08:00
Divakar Verma	bb5b640359	[core] moe fp8 block quant tuning support (#14068 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-03-04 01:30:23 +00:00
Travis Johnson	c060b71408	[Model] Add support for GraniteMoeShared models (#13313 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-04 08:04:52 +08:00
iefgnoix	79e4937c65	[v1] Add comments to the new ragged paged attention Pallas kernel (#14155 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-03-03 23:00:55 +00:00
Qubitium-ModelCloud	cd1d3c3df8	[Docs] Add GPTQModel (#14056 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-03-03 21:59:09 +00:00
Michael Goin	19d98e0c7d	[Kernel] Optimize moe intermediate_cache usage (#13625 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-03 16:29:53 -05:00
Michael Goin	2b04c209ee	[Bugfix] Allow shared_experts skip quantization for DeepSeekV2/V3 (#14100 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-03 14:20:24 -07:00
Mark McLoughlin	ae122b1cbd	[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 19:04:45 +00:00
Nick Hill	872db2be0e	[V1] Simplify stats logging (#14082 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-03 10:34:14 -08:00
Mark McLoughlin	2dfdfed8a0	[V0][Metrics] Deprecate some KV/prefix cache metrics (#14136 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 18:25:46 +00:00
Mark McLoughlin	c41d27156b	[V0][Metrics] Remove unimplemented `vllm:tokens_total` (#14134 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 17:50:22 +00:00
Harry Mellor	91373a0d15	Fix `head_dim` not existing in all model configs (Transformers backend) (#14141 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-03 17:48:11 +00:00
TJian	848a6438ae	[ROCm] Faster Custom Paged Attention kernels (#12348 )	2025-03-03 09:24:45 -08:00
Harry Mellor	98175b2816	Improve the docs for `TransformersModel` (#14147 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-03 17:03:05 +00:00
Mark McLoughlin	4167252eaf	[V1] Refactor parallel sampling support (#13774 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 08:15:27 -08:00
Cody Yu	f35f8e2242	[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 (#13921 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-03 16:43:14 +08:00
Mengqing Cao	b87c21fc89	[Misc][Platform] Move use allgather to platform (#14010 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-03-03 15:40:04 +08:00
wang.yuqi	e584b85afd	[Misc] duplicate code in deepseek_v2 (#14106 )	2025-03-03 14:10:11 +08:00
Sheng Yao	09e56f9262	[Bugfix] Explicitly include "omp.h" for MacOS to avoid installation failure (#14051 )	2025-03-02 17:35:01 -08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Ce Gao	bf33700ecd	[v0][structured output] Support reasoning output (#12955 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-02 14:49:42 -05:00
qux-bbb	bc6ccb9878	[Doc] Source building add clone step (#14086 ) Signed-off-by: qux-bbb <1147635419@qq.com>	2025-03-02 10:59:50 +00:00
Jun Duan	82fbeae92b	[Misc] Accurately capture the time of loading weights (#14063 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-03-01 17:20:30 -08:00
Jee Jee Li	cc5e8f6db8	[Model] Add LoRA support for TransformersModel (#13770 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-02 09:17:34 +08:00
Chen Zhang	d54990da47	[v1] Add `__repr__` to KVCacheBlock to avoid recursive print (#14081 )	2025-03-01 20:46:02 +00:00
Chen Zhang	b9f1d4294e	[v1][Bugfix] Only cache blocks that are not in the prefix cache (#14073 )	2025-03-01 08:25:54 +00:00
Sage Moore	b28246f6ff	[ROCm][V1][Bugfix] Add get_builder_cls method to the ROCmAttentionBackend class (#14065 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-03-01 07:18:32 +00:00
Woosuk Kwon	3b5567a209	[V1][Minor] Do not print attn backend twice (#13985 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-01 07:09:14 +00:00
Isotr0py	fdcc405346	[Doc] Consolidate `whisper` and `florence2` examples (#14050 )	2025-02-28 22:49:15 -08:00
Kuntai Du	8994dabc22	[Documentation] Add more deployment guide for Kubernetes deployment (#13841 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-03-01 06:44:24 +00:00
Li, Jiang	02296f420d	[Bugfix][V1][Minor] Fix shutting_down flag checking in V1 MultiprocExecutor (#14053 )	2025-02-28 22:31:01 -08:00
YajieWang	6a92ff93e1	[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931 )	2025-02-28 22:30:59 -08:00
Jee Jee Li	6a84164add	[Bugfix] Add file lock for ModelScope download (#14060 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-01 06:10:28 +00:00
Brayden Zhong	f64ffa8c25	[Docs] Add `pipeline_parallel_size` to optimization docs (#14059 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-01 05:43:54 +00:00
Luka Govedič	bd56c983d6	[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-02-28 16:20:11 -07:00
Rui Qiao	084bbac8cc	[core] Bump ray to 2.43 (#13994 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-02-28 21:47:44 +00:00
Chen Zhang	28943d36ce	[v1] Move block pool operations to a separate class (#13973 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-28 20:53:31 +00:00
Andrey Talman	b526ca6726	Add RELEASE.md (#13926 ) Signed-off-by: atalman <atalman@fb.com>	2025-02-28 12:25:50 -08:00
Chen Zhang	e7bd944e08	[v1] Cleanup the BlockTable in InputBatch (#13977 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-02-28 19:03:16 +00:00
iefgnoix	c3b6559a10	[V1][TPU] Integrate the new ragged paged attention kernel with vLLM v1 on TPU (#13379 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-02-28 11:01:36 -07:00
Harry Mellor	4be4b26cb7	Fix entrypoint tests for embedding models (#14052 )	2025-02-28 08:56:44 -08:00
Brayden Zhong	2aed2c9fa7	[Doc] Fix ROCm documentation (#14041 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-02-28 16:42:07 +00:00
Yang Liu	9b61dd41e7	[Bugfix] Initialize attention bias on the same device as Query/Key/Value for QwenVL Series (#14031 )	2025-02-28 07:36:08 -08:00
Cyrus Leung	f7bee5c815	[VLM][Bugfix] Enable specifying prompt target via index (#14038 )	2025-02-28 07:35:55 -08:00
Jee Jee Li	e0734387fb	[Bugfix] Fix MoeWNA16Method activation (#14024 )	2025-02-28 15:22:42 +00:00
Harry Mellor	f58f8b5c96	Update AutoAWQ docs (#14042 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-28 15:20:29 +00:00
Thibault Schueller	b3f7aaccd0	[V1][Minor] Restore V1 compatibility with LLMEngine class (#13090 )	2025-02-28 00:52:25 -08:00
Kacper Pietkun	b91660ddb8	[Hardware][Intel-Gaudi] Regional compilation support (#13213 )	2025-02-28 00:51:49 -08:00
Harry Mellor	76c89fcadd	Use smaller embedding model when not testing model specifically (#13891 )	2025-02-28 00:50:43 -08:00
Mathis Felardos	b9e41734c5	[Bugfix][Disaggregated] patch the inflight batching on the decode node in SimpleConnector to avoid hangs in SimpleBuffer (nccl based) (#13987 ) Signed-off-by: Mathis Felardos <mathis@mistral.ai>	2025-02-28 07:53:45 +00:00

... 22 23 24 25 26 ...

6054 Commits