xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-23 17:24:25 +08:00

Author	SHA1	Message	Date
Jee Jee Li	421707dec1	Merge branch 'main' into mlm-full-lora-support	2025-12-12 15:00:59 +08:00
Jee Jee Li	208dc0c954	Fix comments Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-12 00:05:07 +00:00
Nicolò Lucchesi	0efd9f867c	[Core] Whisper Enable Encoder Batching (#29421 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-11 21:06:51 +00:00
Harry Mellor	cf3eacfe58	Standardise `get_rope` to use `rope_parameters["partial_rotary_factor"]`, not `rotary_dim` (#30389 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 20:45:23 +00:00
B-201	e10321bf6a	Merge branch 'main' into mlm-full-lora-support	2025-12-12 00:04:59 +08:00
bk-201	dd857e480f	Merge branch 'mlm-full-lora-support' of https://github.com/jeejeelee/vllm into mlm-full-lora-support	2025-12-11 16:02:37 +00:00
Qiu	a11f4a81e0	[Misc][PCP&DCP] relocate PCP feature check (#30050 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-11 03:36:18 -08:00
wang.yuqi	a5f9fb5960	[Deprecation] Deprecation `--convert reward`, use `--convert embed` instead. (#30463 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-11 10:18:25 +00:00
bk-201	27448490f1	update argument name Signed-off-by: bk-201 <joy25810@foxmail.com>	2025-12-11 06:46:53 +00:00
Cyrus Leung	7e24e5d4d6	[Deprecation] Remove deprecated task, seed and MM settings (#30397 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:39 -08:00
Cyrus Leung	5a87d8b9b1	[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:35 -08:00
B-201	d1307e1d29	Merge branch 'main' into mlm-full-lora-support	2025-12-11 11:47:50 +08:00
Will Eaton	a9e4106f28	[P/D] KV Load Failure Recovery/Abort Configuration (#26813 ) Signed-off-by: Will Eaton <weaton@redhat.com> Signed-off-by: Will Eaton <me@wseaton.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-10 11:00:52 -08:00
Nicolò Lucchesi	c756fb6781	[Core] Whisper enable `FULL_DECODE_ONLY` CudaGraph (#30072 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-10 06:14:24 -08:00
bk-201	5ff0c6fb73	Merge remote-tracking branch 'origin/main' into mlm-full-lora-support	2025-12-10 07:10:58 +00:00
PatrykSaffer	4c2e10ea19	[Bugfix] Fix cuda graph sizes when running with speculative decoding (#30330 ) Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai> Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>	2025-12-10 00:47:07 +00:00
Benjamin Chislett	e858bfe051	[Cleanup] Refactor profiling env vars into a CLI config (#29912 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-09 13:29:33 -05:00
Laith Sakka	87aee9ed2b	Add evaluate_guards option to DynamicShapesConfig (#27432 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-08 10:46:15 -05:00
wang.yuqi	9e77ffca3f	[Model][7/N] Improve all pooling task \| Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-08 08:10:09 +00:00
Isotr0py	b952f4d3c3	[v1] Add PrefixLM support to FlexAttention backend (#27938 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-07 15:51:36 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Wentao Ye	17eb25e327	[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 04:44:50 +00:00
Nick Hill	4026ae31e9	[Misc] Move `disable_nccl_for_dp_synchronization` init logic into `VllmConfig` (#30161 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 20:59:04 -08:00
Rohan Potdar	40a046cd82	[Bugfix]: Fix `TokenizerLike` interface (#30009 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2025-12-05 20:56:40 -08:00
Harry Mellor	bf4a901af9	Better error when world size is larger than node and `distributed_executor_backend` is not set (#30140 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 20:53:52 -08:00
Bangsheng Tang	77e4472809	let draft model follow target model's config_format (#30152 )	2025-12-05 13:33:42 -08:00
Ilya Markov	4e26d3b09e	[Compile] Conditional compilation. Introduce compile_ranges (#24252 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-12-05 18:17:32 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Alec S	2c174420f5	Reduce validation to a warning (#28749 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 14:02:49 +00:00
B-201	1fbd7287b8	Merge branch 'main' into mlm-full-lora-support	2025-12-05 20:17:40 +08:00
bk-201	113eb2e0b8	add a enable option Signed-off-by: bk-201 <joy25810@foxmail.com>	2025-12-05 12:14:53 +00:00
Max Hu	c2894d3883	[Feature] Add Layer-wise NVTX Support (#29990 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Signed-off-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2025-12-05 11:20:07 +00:00
amitz-nv	6038b1b04b	[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2025-12-05 00:34:33 -08:00
Qiu	0098a6e3da	[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms (#29952 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-04 21:40:51 -05:00
Mercykid-bash	1119f6e47a	Abstract eplb algo (#26471 ) Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Signed-off-by: Mercykid-bash <ruanche0218@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 19:09:09 +00:00
wang.yuqi	74c4d80c6c	[Model][6/N] Improve all pooling task \| Support chunked prefill with ALL pooling (#27145 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-04 13:44:15 +00:00
Arpit Khandelwal	dfdda96747	[Core] Remove forced None assignment for deprecated PassConfig flags (#29994 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-04 09:15:04 +00:00
Xieyang Xu	ad32e3e19c	enable multi-node in external launcher mode (#29833 )	2025-12-03 17:02:02 -08:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
Chauncey	b78772c433	[Frontend] supports deepseekv32 chat template (#29837 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-03 20:53:44 +08:00
Yong Hoon Shin	69520bc695	Add logging for cudagraph related info (#29825 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-12-03 01:01:48 -08:00
Arpit Khandelwal	d7284a2604	[Core] Rename PassConfig flags as per RFC #27995 (#29646 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-03 03:38:55 +00:00
Isotr0py	63b1da76ba	[Chore]: Reorganize gguf utils funtions under `transformers_utils` (#29891 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-02 17:33:23 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Julien Denize	d8c6210eea	Add Mistral Large 3 and Ministral 3 (#29757 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: Mickael Seznec <mickael@mistral.ai> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Mickael Seznec <mickael@mistral.ai>	2025-12-02 10:29:00 +00:00
Boyuan Feng	70fb77b4dc	[BugFix] add max-num-batched-token to scheduler hash (#29829 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-12-02 08:55:02 +00:00
Wei Wei	fc95521ba5	[Misc] Throw error on unintended access to scheduler_config.max_model_len (#29771 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-12-02 10:58:44 +08:00
Nengjun Ma	eaf81485ed	[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode (#28935 ) Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-12-01 15:02:18 -05:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00

1 2 3 4 5 ...

362 Commits