Michael Goin
fba7856581
[Perf] Warmup FlashInfer attention during startup ( #23439 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-10 15:03:17 -07:00
Chen Zhang
b5e383cd8b
[gpt-oss] raise error for flashinfer backend without trtllm ( #24482 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-10 14:33:13 -07:00
Gregory Shtrasberg
9a161307f5
[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends ( #19767 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-10 13:59:55 -07:00
Russell Bryant
37e8182bfe
[v1] Add Whisper model support (encoder-decoder) ( #21088 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
2025-09-10 13:53:35 -07:00
Nick Hill
4db4426404
[CI] Fail subprocess tests with root-cause error ( #23795 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-10 13:53:21 -07:00
Thien Tran
a0933c3bd6
[Bugfix] Enable FP8 KV cache for FlashInfer and Triton backend on non-sm100 GPUs ( #24577 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-09-10 12:33:41 -07:00
rongfu.leng
09e68bce34
[Misc] update log level debug to warning when process port is used by ( #24226 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-10 11:32:57 -07:00
Xingyu Liu
9fb74c27a7
[Core] Support configuration parsing plugin ( #24277 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-10 11:32:43 -07:00
Ming Yang
4032949630
[Bugfix] Fix DeepEP config for DP4TP4 ( #23619 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-09-10 10:37:56 -07:00
tomeras91
08abfa78ec
[Bugfix] fix modelopt exclude_modules name mapping ( #24178 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-10 10:20:46 -07:00
Shiyan Deng
2bef2d1405
[Logging] allow config logging stream ( #24336 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
2025-09-10 15:02:01 +00:00
Robin
36cacd0958
[Doc] Add documentation for GLM-4.5 series models: tool-calling and reasoning parser ( #24589 )
...
Signed-off-by: WangErXiao <863579016@qq.com>
2025-09-10 07:50:55 -07:00
Jee Jee Li
bb3eb80d92
[Core] Split LoRA layers ( #24574 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-10 07:47:51 -07:00
pwschuurman
fcc0a3130a
[CI] Fix tensorizer test assertion ( #24545 )
...
Signed-off-by: Peter Schuurman <psch@google.com>
2025-09-10 06:57:36 -07:00
zzhxxx
736569da8d
[Platform] Custom ops support for LMhead and LogitsProcessor ( #23564 )
...
Signed-off-by: zzhx1 <zzh_201018@outlook.com>
2025-09-10 06:26:31 -07:00
Kay Yan
2eb9986a2d
[BugFix] python collect_env.py and vllm collect-env compatibility with uv venv ( #24066 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-09-10 21:25:33 +08:00
Hyogeun Oh (오효근)
ccee371e86
[Docs] Fix warnings in mkdocs build (continued) ( #24092 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-10 06:23:28 -07:00
RoadToNowhereX
c0bd6a684a
Fix Auto_Round Quatization Loading on SM75 and Lower GPUs ( #24217 )
...
Signed-off-by: RoadToNowhereX <37441177+RoadToNowhereX@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-10 06:22:31 -07:00
co63oc
3144d90217
fix some typos ( #24167 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-09-10 06:21:23 -07:00
Daniele
2f5e5c18de
[CI/Build] bump timm dependency ( #24189 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-09-10 06:20:59 -07:00
wang.yuqi
bd98842c8a
[CI] Add PPL test for generation models ( #24485 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-10 06:16:39 -07:00
Lifans
d6069887c6
[rocm] enable torchao quantization for rocm ( #24400 )
...
Signed-off-by: Lifan Shen <lifans@meta.com>
2025-09-10 06:16:21 -07:00
Ye (Charlotte) Qi
492196ed0e
[CI/Build] split true unit tests to Entrypoints Unit Tests ( #24418 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-10 06:16:07 -07:00
Nick Hill
f4f1a8df22
[BugFix] Ensure integrity of reused CPU tensors during async scheduling ( #24527 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: guoze.lin <guozelin@tencent.com>
2025-09-10 21:15:14 +08:00
lacora
0b9a612fa3
[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat ( #24549 )
...
Signed-off-by: lacora2017 <yehu@meta.com>
Co-authored-by: lacora2017 <yehu@meta.com>
2025-09-10 21:14:55 +08:00
Wenlong Wang
4c04eef706
[BugFix][Multi Modal] Fix TensorSchema shape mismatch in Molmo ( #24559 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-10 06:14:27 -07:00
Harry Mellor
f36355abfd
Move LoadConfig from config/__init__.py to config/load.py ( #24566 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-10 06:14:18 -07:00
Yash Pratap Singh
9e3c3a7df2
[LoRA]: Add LoRA support to Mistral's Voxtral models ( #24517 )
...
Signed-off-by: Yash Pratap Singh <yashsingh20001@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-10 06:12:03 -07:00
baonudesifeizhai
6cbd41909e
Feature/vit attention unification# 23880 ( #23978 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-10 06:10:14 -07:00
danielafrimi
72d30108a0
Support for NemotronH Nano VLM ( #23644 )
...
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-09-10 06:10:06 -07:00
Tyler Michael Smith
8b83b93739
[Docs] Document the extra memory footprint overhead when using EPLB ( #24537 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-09-10 06:09:49 -07:00
Harry Mellor
9dbefd88e9
[Docs] Improve organisation of API Reference nav ( #24569 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-10 06:08:21 -07:00
vllmellm
7c195d43da
[ROCm][Bugfix] Fix Aiter RMSNorm ( #23412 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-09-10 21:08:03 +08:00
Lucas Wilkinson
0ae43dbf8c
[Attention] add DCP support for FLASH_ATTN_MLA backend ( #24453 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-10 17:19:26 +08:00
li-jinpeng
267c80d31f
[Model] Limit CPU threads for image transformations in InternVL to reduce cpu contention. ( #24519 )
...
Signed-off-by: li-jinpeng <3332126450@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-10 16:45:44 +08:00
Flora Feng
77f62613f9
Consolidate rendering parameters into RenderConfig dataclass ( #24543 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-10 08:44:47 +00:00
Remy
feaf202e93
[Bugfix] Guard _may_reorder_batch for encoder-only models on CPU ( #24319 ) ( #24348 )
...
Signed-off-by: Remy <eunhwan.shin@dtonic.io>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-09-10 14:24:42 +08:00
Simon Mo
91130ae376
[docs] promo pytorch conf and ray summit ( #24562 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-09-09 23:24:20 -07:00
Harry Mellor
e40827280b
[Docs] Enable relative links in examples to function when rendered in the docs ( #24041 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-09 21:40:45 -07:00
pwschuurman
4377b1ae3b
[Bugfix] Update Run:AI Model Streamer Loading Integration ( #23845 )
...
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Signed-off-by: Peter Schuurman <psch@google.com>
Co-authored-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-09 21:37:17 -07:00
Chenheli Hua
009d689b0c
[Core] Simplify and unify mm uuid handling & auto-generated mm hash overrides processing. ( #24271 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-09-09 21:36:09 -07:00
Wei
0efdb5c3ba
[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading ( #24154 )
...
Signed-off-by: Wei Wei <wwei6@meta.com>
2025-09-10 04:27:53 +00:00
Wenlong Wang
53b42f4102
[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 ( #24392 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-09 21:24:23 -07:00
Chauncey
309d7aa401
[P/D] MultiConnector supports shutdown ( #24425 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-09 21:24:11 -07:00
Yihua Cheng
b4a01aaf95
[KV Connector] More async support for get_num_new_matched_tokens ( #23620 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu>
2025-09-09 21:23:37 -07:00
Nick Hill
83dd28aae4
[CI] Adjust threshold for flaky ngram spec decoding test ( #24528 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-09 21:07:33 -07:00
Nick Hill
f88e84016f
[BugFix] Fix async core engine client finalizer ( #24540 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-09 21:07:13 -07:00
Ignacio Sica
3c2156b3af
[Hardware][Apple-CPU] Enable native bfloat16 on Apple Silicon (M2 and later) ( #24129 )
...
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
2025-09-10 03:50:21 +00:00
Nick Hill
7e7db04310
[CI] Retry flaky fp8 cutlass mla tests ( #24536 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-09 20:33:10 -07:00
Chen Zhang
41f160b974
Add @heheda12345 to CODEOWNERS of KVCacheManager related code ( #24546 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-10 03:30:32 +00:00