Icey
888152bf87
Allow oot custom compiler extension via CompilerInterface ( #28623 )
...
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-11-25 15:25:15 +08:00
Ryan Rock
fe3a4f5b34
[CI/Build] Pin torchgeo dependency for AMD ( #29353 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2025-11-25 07:14:59 +00:00
Fadi Arafeh
98caeadd54
[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei ( #29273 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-25 15:11:11 +08:00
vllmellm
64deead719
[Bugfix] [ROCm] [UX]: revert Flex attention backend ( #29371 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-25 06:56:06 +00:00
Nick Hill
7992324f23
[BugFix] Use unique ids for different transcription prompts ( #29372 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-25 06:55:16 +00:00
Inoki
40a6f53f6c
Display warning only when ROCm version is less than Pytorch required version ( #29200 )
...
Signed-off-by: Inoki <inoki@inoki.cc>
2025-11-25 14:40:06 +08:00
kflu
ce58fdc1c3
Fix PoolingParams.skip_reading_prefix_cache type ( #29364 )
...
Signed-off-by: KFL <kludev@gmail.com>
2025-11-25 06:39:29 +00:00
Fanli Lin
a21256c463
Add TP CLI argument to multimodal inference examples ( #29301 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-11-25 06:03:20 +00:00
Harry Mellor
316c8492bf
Scheduled removal of guided_* config fields ( #29326 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 05:24:05 +00:00
Lucas Wilkinson
2d9ee28cab
[CI/Test Fix] Fix CP tests on Blackwell ( #29338 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-24 20:55:57 -08:00
Jiangyun Zhu
81db702ed2
[Attention] add _cudagraph_support for linear attention ( #28934 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-25 12:25:20 +08:00
Isotr0py
92effb07a4
[Model] Add HunyuanOCR support ( #29327 )
...
Signed-off-by: manayang <jackmanayang@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: sergeywang <sergeywang@tencent.com>
Co-authored-by: manayang <jackmanayang@gmail.com>
Co-authored-by: manayang <manayang@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-25 03:28:51 +00:00
Maryam Tahhan
87185c88d5
[Bugfix] Make deprecated --task embedding consistent with `--runner… ( #29312 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
2025-11-25 03:19:52 +00:00
Mark McLoughlin
9cf4edae6e
[Metrics] Scheduled removal of deprecated metrics ( #29330 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-25 11:15:13 +08:00
汪志鹏
7012d8b45e
[Docker] Optimize Dockerfile: consolidate apt-get and reduce image size by ~200MB ( #29060 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
2025-11-24 19:54:00 -07:00
Divakar Verma
22b42b5402
[CI][ROCm] Install arctic-inference on ROCm tests ( #29344 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-11-25 02:15:39 +00:00
gbyu-amd
cb7214d8ea
[ROCm][MLA] enable fp8 MLA decode on ROCm ( #28032 )
...
Signed-off-by: guanbao <gyu@amd.com>
Signed-off-by: Guanbao Yu <gyu@amd.com>
Signed-off-by: gbyu-amd <Guanbao.Yu@amd.com>
Co-authored-by: guanbao <gyu@amd.com>
2025-11-25 10:15:02 +08:00
Pleaplusone
77e10c9cab
[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence ( #28029 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-24 19:05:46 -07:00
Michael Goin
6f1355a1b7
[Perf] Disable DeepGEMM MoE by default when TP=8 is used ( #29346 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-24 19:01:40 -07:00
Harry Mellor
a4ad43ad5a
Scheduled removal of ParallelConfig's direct child EPLB fields ( #29324 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 01:58:58 +00:00
Nick Hill
a178a0b40b
[BugFix] Fix duplicate id tool-call race condition ( #29355 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-25 01:54:26 +00:00
Kunshang Ji
b8328b49fb
[XPU] upgrade torch & ipex 2.9 on XPU platform ( #29307 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-25 09:34:47 +08:00
Hanjie Qiu
5f9679a43b
[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states ( #27688 )
...
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-24 20:13:12 -05:00
Wentao Ye
699bca76c0
[UX] Raise error for attn backend of batch invariant ( #29348 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-24 17:49:01 -07:00
Michael Goin
c17610e2ba
[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 ( #29339 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-24 18:22:46 -05:00
Chen Zhang
71df2a57ef
[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle ( #29303 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-11-24 14:28:32 -08:00
Tyler Michael Smith
4dd42db566
Remove VLLM_SKIP_WARMUP tip ( #29331 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-11-24 22:16:05 +00:00
Nick Hill
84371daf75
[Tests] Verify gpt_oss package is installed in harmony tests ( #29336 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-24 22:04:31 +00:00
Woosuk Kwon
f32c7d6f54
[Model Runner V2] Simplify Eagle bookkeeping with num_rejected ( #29347 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-24 13:54:59 -08:00
Yan Ma
3cfa63ad99
[XPU]fix Kimi-VL-A3B-thinking on xpu ( #29309 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-11-24 21:02:21 +00:00
Benjamin Bartels
4d6afcaddc
[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies ( #29270 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
2025-11-24 11:40:54 -08:00
Woosuk Kwon
97588c4d12
[Model Runner V2] Add minor clarification comments for Eagle ( #29332 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-24 11:28:56 -08:00
Chenheli Hua
839c6b7b72
[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. ( #27721 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-24 19:24:37 +00:00
bnellnm
8f066146c3
[MoE][Refactor] Make select_experts a non-static method ( #29067 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-11-24 13:38:04 -05:00
Woosuk Kwon
cec418b5df
[Model Runner V2] Change Numba AoT to JIT ( #29328 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-24 09:34:37 -08:00
Woosuk Kwon
cc313cb73d
[Model Runner V2] Implement Single-step Eagle 1 ( #29300 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-24 09:32:27 -08:00
Nicolò Lucchesi
26a465584a
[NIXL] Use config to enable telemetry + NIXL version bump ( #29305 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-24 17:18:04 +00:00
Varun Sundar Rabindranath
e924bbb4f4
[Build/CI][DP/EP] Add QWen/Qwen3-30B-A3B-FP8 + EPLB tests to Nightly H100 and B200 ( #29195 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-24 16:06:17 +00:00
Aydin Abiar
656516c315
[Bugfix] properly handle nested json with llama3 tool parser ( #27701 )
...
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Signed-off-by: Aydin Abiar <62435714+Aydin-ab@users.noreply.github.com>
Co-authored-by: Aydin Abiar <aydin@anyscale.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-11-24 15:28:51 +00:00
vllmellm
e48b2e6848
[Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic ( #26980 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-24 15:24:49 +00:00
Laith Sakka
7a228b5305
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. ( #26199 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-11-24 10:12:41 -05:00
Yuan Tang
f716a15372
Update KServe guide link in documentation ( #29258 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-11-24 14:40:05 +00:00
WeiQing Chen
2601f18a82
[EPLB] Optimize EPLB for Async Rearrange Experts ( #22179 )
...
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: SunChenxiang123 <1291824390@qq.com>
2025-11-24 09:08:29 -05:00
R3hankhan
4de87866a8
[CPU][IBM Z] Fix BF16 support and vectorize math operations for s390x ( #28926 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2025-11-24 12:08:09 +00:00
Didier Durand
eca7a8fb59
[Doc]: fix typos in various files ( #29230 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-24 11:10:48 +00:00
杰兮
8005e606bf
[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP ( #27563 )
...
Signed-off-by: zhyajie <yajizhan@amd.com>
Co-authored-by: zhyajie <yajizhan@amd.com>
2025-11-24 10:16:52 +00:00
rongfu.leng
68dfe28eae
[Feature][Benchmark] add --link-vars can filter when serve_param equal bench_param ( #28909 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-11-24 02:02:28 -08:00
Fanli Lin
ed40d85929
[BugFix] Fix R-VL model loading error ( #29299 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-11-23 22:48:45 -08:00
Roger Wang
0ff70821c9
[Core] Deprecate xformers ( #29262 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-24 04:18:55 +00:00
tongqiu
5253f4276f
[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention ( #28376 )
...
Signed-off-by: apinge <Tong.Qiu2@amd.com>
2025-11-24 03:26:00 +00:00