86 Commits

Author SHA1 Message Date
Nicolò Lucchesi
0efd9f867c
[Core] Whisper Enable Encoder Batching (#29421)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-11 21:06:51 +00:00
wang.yuqi
a5f9fb5960
[Deprecation] Deprecation --convert reward, use --convert embed instead. (#30463)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-11 10:18:25 +00:00
Cyrus Leung
7e24e5d4d6
[Deprecation] Remove deprecated task, seed and MM settings (#30397)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:39 -08:00
wang.yuqi
9e77ffca3f
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-08 08:10:09 +00:00
Isotr0py
b952f4d3c3
[v1] Add PrefixLM support to FlexAttention backend (#27938)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-07 15:51:36 +00:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Rohan Potdar
40a046cd82
[Bugfix]: Fix TokenizerLike interface (#30009)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2025-12-05 20:56:40 -08:00
Matthew Bonanni
66e674cdd5
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-12-05 09:48:43 -08:00
wang.yuqi
74c4d80c6c
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling (#27145)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
Chauncey
b78772c433
[Frontend] supports deepseekv32 chat template (#29837)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 20:53:44 +08:00
Isotr0py
63b1da76ba
[Chore]: Reorganize gguf utils funtions under transformers_utils (#29891)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-02 17:33:23 +00:00
Cyrus Leung
f0a28bf661
[Misc] Unify tokenizer registration (#29767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-01 11:34:58 +00:00
Xingyu Liu
21c2627934
[Misc]Remove redundant hidden_size property in ModelConfig (#29749)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-30 17:14:23 +00:00
Cyrus Leung
64bc09ba27
[Core] Enable inputs_embeds_size separate from hidden_size (#29741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-30 17:31:12 +08:00
Zhengxu Chen
6173682b6e
[compile] Include enable_sleep_mode into caching factors. (#29696)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-11-29 07:58:38 +08:00
wang.yuqi
f4b76056ee
Improve enable chunked_prefill & prefix_caching logic. (#26623)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-27 22:05:48 -08:00
Injae Ryou
0840abdd24
[BugFix] Optional tokenizer argument when loading GGUF models (#29582)
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-27 16:53:10 +00:00
Matthew Bonanni
fc1d8be3dc
[Attention] Update attention imports (#29540)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-27 11:19:09 -05:00
Morrison Turnansky
0838b52e2e
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up -O infrastructure (#26847)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: adabeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-27 01:55:58 -08:00
Zhengxu Chen
0abc79482a
[caching] Add enable_prompt_embeds and cpu_offload_gb to compile hashes. (#29435)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-11-25 21:46:41 +00:00
Harry Mellor
a1f2676879
Scheduled removal of override_pooler_config and disable_log_requests (#29402)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-25 16:08:57 +00:00
Injae Ryou
794029f012
[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type (#29137)
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-25 14:28:53 +00:00
Isotr0py
92effb07a4
[Model] Add HunyuanOCR support (#29327)
Signed-off-by: manayang <jackmanayang@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: sergeywang <sergeywang@tencent.com>
Co-authored-by: manayang <jackmanayang@gmail.com>
Co-authored-by: manayang <manayang@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-25 03:28:51 +00:00
Maryam Tahhan
87185c88d5
[Bugfix] Make deprecated --task embedding consistent with `--runner… (#29312)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
2025-11-25 03:19:52 +00:00
Julien Denize
57430fc95c
Default model load/config/tokenizer to mistral format if relevant files exist (#28659)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-21 13:58:59 -08:00
Cyrus Leung
d7219bcda3
[Misc] Move dynamic seed initialization to EngineArgs (#29165)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 15:27:44 +00:00
Isotr0py
64192d5624
[Bugfix] Revert custom attention mask for gemma3-mm (#28995)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 13:23:22 +08:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
vnadathur
1ffe934c8a
[torch.compile] caching of config fields should be opt-out by default (#26468)
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 06:13:54 -08:00
Tova Movshovitz
ba558c029a
[config] Expose get_total_num_hidden_layers() in ModelConfig (#28961)
Signed-off-by: tovam <tovam@pliops.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-19 11:37:11 +00:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Luciano Martins
c2612371ad
[Model] Add Gemma3 GGUF multimodal support (#27772)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 08:56:29 -08:00
Harry Mellor
5f3cd7f7f2
[Docs] Update the name of Transformers backend -> Transformers modeling backend (#28725)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 16:34:14 +00:00
Jingchun Gao
4516d44b7f
[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer (#25438)
Signed-off-by: gaojc <1055866782@qq.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: gaojingchun (A) <g00955623@china.huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-11-14 11:24:10 +00:00
Harry Mellor
b230286fbc
Fix get_num_experts when config sets it explicitly to None (#28652)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: bruceszchen <bruceszchen@tencent.com>
2025-11-13 16:02:42 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
4ca5cd5740
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
2025-11-12 15:24:12 -08:00
Thomas Parnell
64d57c3be7
[Model] [Config] Correctly identify granite-4.0-micro as non-hybrid model (#28563)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-11-12 18:17:55 +00:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
Harry Mellor
2f1cc8cef1
Remove deprecated --rope-scaling and --rope-theta (#28006)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-04 18:01:56 +00:00
yt0428
05cae69f0f
[model] Add support for openPangu_Ultra_MoE (#27521)
Signed-off-by: yuantao <2422264527@qq.com>
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-04 08:17:20 -08:00
Asaf Joseph Gardin
00b31a36a2
[V1] [Hybrid] Mamba1 Automatic Prefix Caching (#26377)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
2025-11-02 04:16:23 -08:00
Paul Zhang
e7acb20076
[Feature] Batch invariant torch.compile (#27660)
Signed-off-by: PaulZhang12 <paulzhan@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-30 13:11:29 -07:00
Zhiyuan Li
4e68cc9b6a
[Model] Introduce Kimi Linear to vLLM (#27809)
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
2025-10-30 21:02:27 +08:00
wangxiyuan
af826e0820
[V0 deprecation] Remove VLLM_USE_V1 usage in config module (#27784)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-30 09:42:49 +00:00
Cyrus Leung
f5710ef02a
[Misc] Make LayerBlockType a Literal instead of Enum (#27658)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-28 16:23:35 +00:00
Zhengxu Chen
259504e147
[compile] Add enable_prompt_embeds to compile hash. (#27285)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-28 20:46:03 +08:00
Matthew Bonanni
44b5ce956d
[Bugfix] In LongRoPE, decide short vs long based on max_model_len (#27431)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-28 12:00:56 +00:00
Cyrus Leung
6ebffafbb6
[Misc] Clean up more utils (#27567)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-27 15:30:38 +00:00
Asaf Joseph Gardin
9273754222
[Hybrid] Added supports_mamba_prefix_caching Protocol (#27339)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
2025-10-27 13:05:20 +00:00