Le Chen
3d7363e61c
[Config] add "qwen" as a native eagle3 target supported model ( #22333 )
...
Signed-off-by: lechen <lecself@163.com>
Signed-off-by: LeChen <lecself@163.com>
2025-08-09 20:21:05 -07:00
Jee Jee Li
0c5254b82a
[oss] Init gpt-oss bf16 support ( #22508 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-09 20:19:13 -07:00
Thomas Parnell
61f67d8acd
[V1] [Hybrid] Enable Full CUDA Graph (decode-only) for Mamba layers ( #21401 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-09 20:16:11 -07:00
TJian
42172ad18f
[FEAT] [Performance] Add triton mrope to replace the torch code path ( #22375 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-08-09 11:50:03 -07:00
Isotr0py
fbd8595c5c
[Bugfix] Fix basic models tests hanging due to mm processor creation ( #22571 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-09 11:42:21 -07:00
Nicolò Lucchesi
5a16fa614c
[Model] Gemma3n MM ( #20495 )
...
Signed-off-by: ShriKode <shrikode@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: ShriKode <shrikode@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-09 09:56:25 -07:00
Harry Mellor
2d18256e47
Move ParallelConfig from config/__init__.py to config/parallel.py ( #22565 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-09 08:33:46 -07:00
Harry Mellor
56186474f6
[Docs] Reduce noise in docs and --help from the JSON tip ( #22567 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-09 08:31:32 -07:00
Thomas Parnell
1bf5e1f25b
[CI] [Hybrid] Speed up hybrid models test by removing large models ( #22563 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-09 02:04:42 -07:00
Yuxuan Zhang
a6022e6fbc
GLM-4.5V with new class name at transformers ( #22520 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-09 00:50:21 -07:00
Thomas Parnell
2be07a0db1
Update docs for Minimax-Text support ( #22562 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-09 00:18:18 -07:00
Jee Jee Li
0edc0cd52b
[Bugfix] Fix CI moe kernel failure ( #22556 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-09 00:03:29 -07:00
Isotr0py
7920e9b1c5
[Bugfix] Fix failing GPT-OSS initialization test ( #22557 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-09 00:03:26 -07:00
Charlie Fu
b7c0942b65
[ROCm][Misc] Rename the context_len to seq_len in ROCm custom paged attention kernel ( #22097 )
...
Signed-off-by: charlifu <charlifu@amd.com>
2025-08-08 23:15:06 -07:00
Kyuyeun Kim
9a0c5ded5a
[TPU] Add support for online w8a8 quantization ( #22425 )
...
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
2025-08-08 23:12:54 -07:00
Eldar Kurtić
10a02535d4
Fix loading of quantized BigCode models ( #22463 )
...
Signed-off-by: Eldar Kurtic <eldar@neuralmagic.com>
2025-08-08 23:12:12 -07:00
Cyrus Leung
65552b476b
[Misc] Use config definitions from Transformers library ( #21913 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-08 23:10:51 -07:00
Or Ozeri
7ad7adb67f
v1: Pass KVConnectorOutput to scheduler-side ( #22157 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-08-08 23:09:51 -07:00
Thomas Parnell
6ade99eafa
[V1] [Hybrid] Support Minimax-Text-01 in V1 ( #22151 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-08 23:08:48 -07:00
Wentao Ye
3157aebb63
[Log] Add Warning for Deprecation of DeepGEMM old version ( #22194 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-08 23:07:48 -07:00
Thomas Parnell
8a0ffd6285
Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation ( #22541 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-08 23:05:32 -07:00
Roger Wang
23472ff51c
[Doc] Add usage of implicit text-only mode ( #22561 )
...
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
2025-08-08 23:04:19 -07:00
Roger Wang
08b751ba74
Implicit language-model-only mode via limit-mm-per-prompt ( #22299 )
...
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Andy Xie <andy.xning@gmail.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: XIn Li <xinli@nvidia.com>
Signed-off-by: Junhao Li <junhao@ubicloud.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Signed-off-by: Linkun <github@lkchen.net>
Co-authored-by: Ning Xie <andy.xning@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Zhiyu <zhiyuc@nvidia.com>
Co-authored-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: XIn Li <xinli@nvidia.com>
Co-authored-by: Junhao Li <streaver91@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Po-Han Huang (NVIDIA) <53919306+nvpohanh@users.noreply.github.com>
Co-authored-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Hong Hanh <hanh.usth@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: lkchen <github@lkchen.net>
2025-08-08 22:21:40 -07:00
Isotr0py
429e4e2d42
[Bugfix] Fix ModernBert cuda graph capturing in v1 ( #21901 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-08 22:17:22 -07:00
Pradyun92
35afe1b30b
[BugFix] [P/D] Handle lookahead token count edge-case with Eagle Spec Decoding and P/D ( #22317 )
...
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
Signed-off-by: Pradyun92 <142861237+Pradyun92@users.noreply.github.com>
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-08-08 17:04:15 -07:00
Kunshang Ji
81c57f60a2
[XPU] upgrade torch 2.8 on for XPU ( #22300 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-08 17:03:45 -07:00
Russell Bryant
311d875614
Drop flaky test_healthcheck_response_time ( #22539 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-08-08 16:56:47 -07:00
Harry Mellor
e3edc0a7a8
Extract CompilationConfig from config.py ( #22524 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-08 16:34:25 -07:00
yyweiss
baece8c3d2
[Frontend] Add unix domain socket support ( #18097 )
...
Signed-off-by: <yyweiss@gmail.com>
Signed-off-by: yyw <yyweiss@gmail.com>
2025-08-08 16:23:44 -07:00
Guy Stone
2fcf6b27b6
[Docs] fix broken links in metrics.md ( #22315 )
...
Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-08 16:22:35 -07:00
Harry Mellor
41b9655751
Skip Qwen 1 in CI because remote code is no longer compatible with Transformers ( #22536 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-08 16:20:58 -07:00
Thomas Parnell
bd875d2eb7
[Bugfix] Update FA commit hash ( #22546 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-08 16:10:25 -07:00
Varun Sundar Rabindranath
f703b923f3
[Misc] DeepGEMM : Avoid JIT generation in the hot-path ( #22215 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-08-08 16:09:59 -07:00
Lucas Wilkinson
cd9b9de1fb
[BugFix] Fix IMA FlashMLA full cuda-graph and DP + Update FlashMLA ( #21691 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-08 16:09:42 -07:00
Chen Zhang
fe6d8257a1
[gpt-oss] Support tool call and implement MCP tool server ( #22427 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-08 15:06:37 -07:00
Ricardo Decal
e290594072
[Docs] Rename “Distributed inference and serving” to “Parallelism & Scaling” ( #22466 )
...
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-08-08 19:26:21 +00:00
Yongye Zhu
f756a682d9
[gpt-oss] guard import when triton kernel is not installed ( #22529 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-08 11:18:33 -07:00
Daniel Serebrenik
f0964e29cb
[Benchmark] Add benchmark tool for multi turn conversations ( #20267 )
2025-08-08 10:28:50 -07:00
Yongye Zhu
e789cad6b8
[gpt-oss] triton kernel mxfp4 ( #22421 )
...
Signed-off-by: <zyy1102000@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-08-08 08:24:07 -07:00
Harry Mellor
e5ebeeba53
Remove exception for Python 3.8 typing from linter ( #22506 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-08 03:06:46 -07:00
Harry Mellor
7be7f3824a
[Docs] Improve API docs (+small tweaks) ( #22459 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-08 03:02:51 -07:00
Nick Hill
ccdae737a0
[BugFix] Don't cancel asyncio tasks directly from destructors ( #22476 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-08 01:13:18 -07:00
rongfu.leng
904063907c
[Misc] fix openai version ( #22485 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-08-08 01:12:54 -07:00
Cyrus Leung
43c4f3d77c
[Misc] Begin deprecation of get_tensor_model_*_group ( #22494 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-08 01:11:54 -07:00
Cyrus Leung
1712543df6
[CI/Build] Fix multimodal tests ( #22491 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-08 00:31:19 -07:00
lkchen
808a7b69df
[bench] Fix benchmark/serve.py to ignore unavailable results ( #22382 )
...
Signed-off-by: Linkun <github@lkchen.net>
2025-08-07 23:15:50 -07:00
iAmir97
099c046463
[Doc] Sleep mode documentation ( #22310 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Hong Hanh <hanh.usth@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-08-08 12:25:18 +08:00
Po-Han Huang (NVIDIA)
af473f0a85
[bugfix] Fix Llama3/4 issues caused by FlashInfer 0.2.10 ( #22426 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-08-07 20:25:01 -07:00
Cyrus Leung
157f9c1368
Fix pre-commit ( #22487 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-07 20:21:54 -07:00
ZiTian Zhao
6f287915d8
Optimize MiniCPMO mask creation with vectorized implementation ( #22464 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
2025-08-07 20:18:50 -07:00