Laith Sakka
87aee9ed2b
Add evaluate_guards option to DynamicShapesConfig ( #27432 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-08 10:46:15 -05:00
wang.yuqi
9e77ffca3f
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API ( #26686 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-08 08:10:09 +00:00
Isotr0py
b952f4d3c3
[v1] Add PrefixLM support to FlexAttention backend ( #27938 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-07 15:51:36 +00:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )" ( #30199 )
2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Wentao Ye
17eb25e327
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement ( #29558 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 04:44:50 +00:00
Nick Hill
4026ae31e9
[Misc] Move disable_nccl_for_dp_synchronization init logic into VllmConfig ( #30161 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-05 20:59:04 -08:00
Rohan Potdar
40a046cd82
[Bugfix]: Fix TokenizerLike interface ( #30009 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2025-12-05 20:56:40 -08:00
Harry Mellor
bf4a901af9
Better error when world size is larger than node and distributed_executor_backend is not set ( #30140 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-05 20:53:52 -08:00
Bangsheng Tang
77e4472809
let draft model follow target model's config_format ( #30152 )
2025-12-05 13:33:42 -08:00
Ilya Markov
4e26d3b09e
[Compile] Conditional compilation. Introduce compile_ranges ( #24252 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
2025-12-05 18:17:32 +00:00
Matthew Bonanni
66e674cdd5
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments ( #26315 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-12-05 09:48:43 -08:00
Alec S
2c174420f5
Reduce validation to a warning ( #28749 )
...
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-05 14:02:49 +00:00
Max Hu
c2894d3883
[Feature] Add Layer-wise NVTX Support ( #29990 )
...
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Signed-off-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
2025-12-05 11:20:07 +00:00
amitz-nv
6038b1b04b
[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH ( #29978 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
2025-12-05 00:34:33 -08:00
Qiu
0098a6e3da
[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms ( #29952 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 21:40:51 -05:00
Mercykid-bash
1119f6e47a
Abstract eplb algo ( #26471 )
...
Signed-off-by: Che Ruan <cr623@ic.ac.uk>
Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Signed-off-by: Mercykid-bash <ruanche0218@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Che Ruan <cr623@ic.ac.uk>
Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 19:09:09 +00:00
wang.yuqi
74c4d80c6c
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling ( #27145 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
Arpit Khandelwal
dfdda96747
[Core] Remove forced None assignment for deprecated PassConfig flags ( #29994 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 09:15:04 +00:00
Xieyang Xu
ad32e3e19c
enable multi-node in external launcher mode ( #29833 )
2025-12-03 17:02:02 -08:00
Lumis Chen
9bcf92295a
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching ( #29163 )
...
Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Lumis Chen <lumischen01@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-12-03 16:06:57 +00:00
Chauncey
b78772c433
[Frontend] supports deepseekv32 chat template ( #29837 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 20:53:44 +08:00
Yong Hoon Shin
69520bc695
Add logging for cudagraph related info ( #29825 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-12-03 01:01:48 -08:00
Arpit Khandelwal
d7284a2604
[Core] Rename PassConfig flags as per RFC #27995 ( #29646 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-12-03 03:38:55 +00:00
Isotr0py
63b1da76ba
[Chore]: Reorganize gguf utils funtions under transformers_utils ( #29891 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-02 17:33:23 +00:00
Harry Mellor
951445a52d
Remove default values from InitVars so that they're not stored ( #29859 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 12:16:37 +00:00
Julien Denize
d8c6210eea
Add Mistral Large 3 and Ministral 3 ( #29757 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Mickael Seznec <mickael@mistral.ai>
2025-12-02 10:29:00 +00:00
Boyuan Feng
70fb77b4dc
[BugFix] add max-num-batched-token to scheduler hash ( #29829 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-02 08:55:02 +00:00
Wei Wei
fc95521ba5
[Misc] Throw error on unintended access to scheduler_config.max_model_len ( #29771 )
...
Signed-off-by: Wei Wei <wwei6@meta.com>
2025-12-02 10:58:44 +08:00
Nengjun Ma
eaf81485ed
[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode ( #28935 )
...
Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-12-01 15:02:18 -05:00
shivampr
cabc77cc86
[Core][Observability] Add KV cache residency metrics ( #27793 )
...
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior:
vllm:kv_block_lifetime_seconds — total lifetime from allocation to free
vllm:kv_block_idle_before_evict_seconds — idle duration before eviction
vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block
These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates.
Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled.
Two new runtime flags are introduced:
--kv-cache-metrics – enable KV cache residency metrics
--kv-cache-metrics-sample – control sampling ratio (default: 0.01)
Signed-off-by: Shivam <shivamprasad91@gmail.com>
2025-12-01 18:27:53 +00:00
FredericOdermatt
5d43f7372e
[Doc] Update description disable_any_whitespace ( #29784 )
...
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>
2025-12-01 16:48:33 +00:00
Cyrus Leung
f0a28bf661
[Misc] Unify tokenizer registration ( #29767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-01 11:34:58 +00:00
Xingyu Liu
21c2627934
[Misc]Remove redundant hidden_size property in ModelConfig ( #29749 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-30 17:14:23 +00:00
Cyrus Leung
64bc09ba27
[Core] Enable inputs_embeds_size separate from hidden_size ( #29741 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-30 17:31:12 +08:00
Zhengxu Chen
6173682b6e
[compile] Include enable_sleep_mode into caching factors. ( #29696 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-11-29 07:58:38 +08:00
Yanan Cao
3461e7efd8
[Frontend] Remap -O to -cc commandline flag ( #29557 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-11-28 21:51:12 +00:00
wang.yuqi
f4b76056ee
Improve enable chunked_prefill & prefix_caching logic. ( #26623 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-27 22:05:48 -08:00
Cyrus Leung
a24ea5414b
[Deprecation] Advance deprecation status ( #29617 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-27 19:04:58 +00:00
Injae Ryou
0840abdd24
[BugFix] Optional tokenizer argument when loading GGUF models ( #29582 )
...
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-27 16:53:10 +00:00
Matthew Bonanni
fc1d8be3dc
[Attention] Update attention imports ( #29540 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-27 11:19:09 -05:00
Didier Durand
66d3d5422c
[Doc]: fixing typos in diverse files ( #29492 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-27 07:15:50 -08:00
Morrison Turnansky
0838b52e2e
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): Set up -O infrastructure ( #26847 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: adabeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-27 01:55:58 -08:00
George D. Torres
56531b79cc
[Misc] Add backup hash algorithm for FIPS constrained environments ( #28795 )
...
Signed-off-by: George D. Torres <gdavtor@gmail.com>
Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-11-26 00:50:22 +00:00
Lucia Fang
d8819c88eb
fix assertion for single world use case (uni) ( #29429 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-11-26 00:14:23 +00:00
Zhengxu Chen
0abc79482a
[caching] Add enable_prompt_embeds and cpu_offload_gb to compile hashes. ( #29435 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-11-25 21:46:41 +00:00
Harry Mellor
a1f2676879
Scheduled removal of override_pooler_config and disable_log_requests ( #29402 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-25 16:08:57 +00:00
Injae Ryou
794029f012
[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type ( #29137 )
...
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-25 14:28:53 +00:00
Harry Mellor
51fc9e017a
Scheduled removal of CompilationConfig.use_inductor ( #29323 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 12:55:42 +00:00
Icey
888152bf87
Allow oot custom compiler extension via CompilerInterface ( #28623 )
...
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-11-25 15:25:15 +08:00