7638 Commits

Author SHA1 Message Date
Reid
cf4cd53982
[Misc] Add logger.exception for TPU information collection failures (#20510)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-05 07:24:32 -07:00
Isotr0py
32c9be2200
[v1] Re-add fp32 support to v1 engine through FlexAttention (#19754)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-05 09:41:10 +00:00
Lucia Fang
8aeaa910a2
Fix unknown attribute of topk_indices_dtype in CompressedTensorsW8A8Fp8MoECutlassMethod (#20507)
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-07-05 14:03:20 +08:00
Jee Jee Li
906e05d840
[Misc] Remove the unused LoRA test code (#20494)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-05 13:48:16 +08:00
Reid
ef9a2990ae
[doc] small fix (#20506)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-04 20:56:39 -07:00
Reid
7e90870491
[Misc] Add security warning for development mode endpoints (#20508)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-04 20:52:13 -07:00
Guy Stone
d3f05c9248
[Doc] fix mutltimodal_inputs.md gh examples link (#20497)
Signed-off-by: Guy Stone <guys@spotify.com>
2025-07-04 16:41:35 -07:00
Michael Goin
c108781c85
[CI Bugfix] Fix pre-commit failures on main (#20502) 2025-07-04 14:17:30 -07:00
Duncan Moss
3d184b95b8
[feat]: CUTLASS block scaled group gemm for SM100 (#19757)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Duncan Moss <dmoss@nvidia.com>
2025-07-04 12:58:04 -06:00
Thomas Parnell
2f35a022e6
Enable V1 for Hybrid SSM/Attention Models (#20016)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-07-04 17:46:53 +00:00
Chenheli Hua
ffe00ef77a
[Misc] Small: Remove global media connector. Each test should have its own test connector object. (#20395)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-07-04 08:15:03 -07:00
Peter Pan
5561681d04
[CI] add kvcache-connector dependency definition and add into CI build (#18193)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-04 06:49:18 -07:00
Cyrus Leung
fbd62d8750
[Doc] Fix classification table in list of supported models (#20489)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-04 06:08:02 -07:00
wang.yuqi
2e26f9156a
[Model][3/N] Automatic conversion of CrossEncoding model (#20168)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-04 05:47:39 -07:00
sangbumlikeagod
9e5452ee34
[Bug][Frontend] Fix structure of transcription's decoder_prompt (#18809)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
2025-07-04 11:28:07 +00:00
Michael Goin
0e3fe896e2
Support Llama 4 for fused_marlin_moe (#20457)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-04 07:55:10 +00:00
Jee Jee Li
1caca5a589
[Misc] Add SPDX-FileCopyrightText (#20428)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-04 07:40:42 +00:00
Wentao Ye
783921d889
[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels (#20331)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-04 15:06:24 +08:00
Aaron Pham
4a98edff1f
[Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-07-04 15:05:49 +08:00
Reid
a7bab0c9e5
[Misc] small update (#20462)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 20:33:44 -07:00
汪志鹏
25950dca9b
Add ignore consolidated file in mistral example code (#20420)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-07-04 02:55:07 +00:00
Gabriel Marinho
a4113b035c
[Platform] Add custom default max tokens (#18557)
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
2025-07-04 10:50:17 +08:00
Michael Goin
7e1665b089
[Misc] Change warn_for_unimplemented_methods to debug (#20455) 2025-07-04 02:35:08 +00:00
Seiji Eicher
8d1096e7db
[Bugfix] Register reducer even if transformers_modules not available (#19510)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-07-03 22:08:12 +00:00
Nicolò Lucchesi
8d775dd30a
[Misc] Fix Unable to detect current VLLM config. Defaulting to NHD kv cache layout warning (#20400)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-03 14:56:09 -07:00
bnellnm
78fe77534b
[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-07-03 14:55:40 -07:00
Yuxuan Zhang
2f2fcb31b8
[Misc] Remove _maybe_ignore_quant_config from GLM4.1v (#20432)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
v0.9.2rc1
2025-07-03 21:41:13 +00:00
Ning Xie
1dba2c4ebe
[Misc] adjust for ipv6 for mookcacke url parse (#20107)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-03 20:27:17 +00:00
Isotr0py
71d6de3a26
[Misc] Clean up InternVL family config registration (#19992)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-03 20:01:47 +00:00
Alexei-V-Ivanov-AMD
536fd33003
[CI] Trimming some failing test groups from AMDPRODUCTION. (#20390) 2025-07-03 08:21:31 -07:00
Reid
619b9f5c7e
[Frontend] fix duplicate output for bench subcmd (#20446)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 08:02:06 -07:00
Nicolò Lucchesi
d1b689c445
[Bugfix] Fix flaky test_streaming_response test (#20363)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-03 14:46:24 +00:00
Reid
9854dc9040
[Frontend] improve vllm bench <bench_type> --help display (#20430)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 14:22:16 +00:00
Isotr0py
ff5c60fad8
[Misc] Automatically tag PRs to add new models (#20222)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-03 07:11:03 -07:00
wang.yuqi
6f1229f91d
[Model][2/N] Automatic conversion of CrossEncoding model (#19978)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-03 13:59:23 +00:00
Jee Jee Li
1819fbda63
[Quantization] Bump to use latest bitsandbytes (#20424)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-03 21:58:46 +08:00
Li, Jiang
7f0367109e
[CI/Build][CPU] Enable cross compilation in CPU release pipeline (#20423)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-03 05:26:12 -07:00
Ning Xie
fb14d53cf6
[Kernel] refactor cpu worker v0 cache dtype (#20080)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-03 08:39:14 +00:00
Cyrus Leung
b024a42e93
[Core] Move multimodal placeholder from chat utils to model definition (#20355)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-03 08:18:30 +00:00
Michael Yao
cb97f2bfc5
[Docs] Replace two list with tables in intel_gaudi.md (#20414)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-07-03 00:48:25 -07:00
Reid
359200f6ac
[doc] fix link (#20417)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 00:21:57 -07:00
Lifans
220aee902a
[Misc] Add rules to label Speculative Decoding Related PRs (#20406)
Signed-off-by: Lifan Shen <lifans@meta.com>
2025-07-02 23:56:49 -07:00
Nick Hill
67d25eca05
[Tests] Update online DP tests to verify that requests are balanced (#20157)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-03 14:49:13 +08:00
qscqesze
363528de27
[Feature] Support MiniMax-M1 function calls features (#20297)
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
2025-07-03 06:48:27 +00:00
QiliangCui
4ff61ababa
[TPU] Add a case to cover RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 (#20385)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-03 06:46:41 +00:00
Li, Jiang
0ec3779df7
[Bugfix][CI/CD][CPU] Fix CPU CI tests (#20383)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-02 20:11:36 -07:00
Chenheli Hua
b616f6a53d
[Misc] Small: Fix video loader return type annotations. (#20389)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-07-03 03:10:39 +00:00
bnellnm
2e25bb12a8
[Bugfix] Fix import of CutlassExpertsFp8 in compressed_tensors_moe.py (#20381)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-07-03 02:07:43 +00:00
Louie Tsai
9965c47d0d
Enable CPU nightly performance benchmark and its Markdown report (#18444)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-07-02 17:50:25 -07:00
Nick Hill
059d4cdb49
[BugFix] Fix DP headless mode arg validation (#20398)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-02 17:15:32 -07:00