8780 Commits

Author SHA1 Message Date
Pavani Majety
1d353b6352
[Core] Always use tensor cores for Flashinfer Decode Wrapper (#23214)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-08-21 16:02:11 -04:00
Ning Xie
3496274663
[Misc] Convert VLLM_TORCH_PROFILER_DIR path to absolute (#23191)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-21 15:49:09 -04:00
Chen Zhang
8a19303173
[BugFix][gpt-oss] Fix Chat Completion with Multiple Output Message (#23318)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-21 10:31:11 -07:00
Nick Hill
603fbbbce0
[Misc] Misc code cleanup/simplification (#23304)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-21 17:22:55 +00:00
Ming Yang
10f535c086
[Bugfix] Fix port conflict by obtaining a list of open ports upfront (#21894)
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-08-21 10:22:18 -07:00
Wentao Ye
48bfb0c9b7
[Bug] Fix R1 Accuracy 0 Bug (#23294)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-21 13:11:28 -04:00
Lain
f8ce022948
add tg-mxfp4-moe-test (#22540)
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-21 17:05:47 +00:00
Yi Liu
0278f1ac3a
Fix nvfp4 swizzling (#23140)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-21 16:54:50 +00:00
Benji Beck
a482e4e769
Migrate MolmoImageInputs to TensorSchema (#22022)
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-08-21 16:54:08 +00:00
youkaichao
e0b056e443
[ci/build] Fix abi tag for aarch64 (#23329)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-21 23:32:55 +08:00
Roger Wang
79f05e4436
[Multimodal] Always enable hashing mm data (#23308)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-21 07:23:28 -07:00
jerryzhuang
f8daddcc4c
[Bugfix] set system_message in phi4mini chat template (#23309)
Signed-off-by: zhuangqh <zhuangqhc@gmail.com>
2025-08-21 14:22:39 +00:00
Robert Shaw
c8e33c72c6
[V1] Remove unnecessary check for main thread (#23298)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-08-21 14:08:35 +00:00
wang.yuqi
d70a16625d
[Performance] V1 Pooling Models E2E Performance Optimization (#23162)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-21 13:26:09 +00:00
Cyrus Leung
5cc54f7c5b
[Doc] Fix batch-level DP example (#23325)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-08-21 06:16:38 -07:00
Cyrus Leung
0c6e40bbaa
[Refactor] Simplify code for MM budget (#23310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-21 08:00:16 +00:00
Paul Pak
2e2000f352
[Model] Add LFM2 architecture (#22845)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
2025-08-21 09:35:07 +02:00
Jared O'Connell
31282401b6
[BugFix] Fix Python 3.9 Support (#23306)
Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-08-20 23:23:56 -07:00
Cyrus Leung
0c31e28e95
[Bugfix] Fix extra whitespace in strings caused by newline (#23272)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-20 22:03:00 -07:00
22quinn
f571ff8eb6
[Sampler] Support returning final logprobs (#22387)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-20 21:28:32 -07:00
Michael Goin
f64ee61d9e
[CI] Block the cu126 wheel build while broken (#23285)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-21 04:21:05 +00:00
QiliangCui
8993073dc1
[CI] Delete images older than 24h. (#23291)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-08-20 21:15:20 -07:00
杨奇(yann qi)
655a09f653
[Model][VLM] Support R-4B Model (#23246)
Signed-off-by: yannqi <yannqi@qq.com>
Signed-off-by: 杨奇(yann qi) <51905299+yannqi@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: yannqiyang <yannqiyang@tencent.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-08-21 04:08:52 +00:00
Wentao Ye
f94bf9b924
[Compile] Fix Compile Warning SM100 Cutlass MLA (#23287)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-21 03:09:39 +00:00
Asaf Joseph Gardin
3663870c72
[V1][Mamba1] - Full CUDA and Piecewise CUDA Graphs Support (#23035)
Signed-off-by: asafg <asafg@ai21.com>
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
Co-authored-by: asafg <asafg@ai21.com>
2025-08-20 20:08:51 -07:00
Cyrus Leung
2461d9e562
[CI/Build] Split out mm processor tests (#23260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-20 20:05:20 -07:00
Li, Jiang
7be5d113d8
[CPU] Refactor CPU W8A8 scaled_mm (#23071)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-21 09:34:24 +08:00
Woosuk Kwon
b029de9902
[Optimization] Make new_block_ids None if empty (#23262)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-08-20 18:25:56 -07:00
Michael Goin
bbea1cefdd
[CI Bugfix] Fix CI by fully removing --enable-prompt-adapter (#23284)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-20 17:18:12 -07:00
Russell Bryant
f5aa307d77
Remove duplicate entry in vllm.attention.__all__ (#23296)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-08-20 17:14:59 -07:00
22quinn
4b795020ed
[EP] Add logging for experts map (#22685)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-08-20 23:46:06 +00:00
shixianc
c86af22f31
[Fix] remove is_marlin param in benchmark_moe (#23286) 2025-08-20 22:04:21 +00:00
Matthew Bonanni
10cc12ba66
Feature/mla tests (#23195)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-08-20 21:46:47 +00:00
Matthew Bonanni
a4fbb32fab
Remove chunked_prefill_enabled flag in V1 MLA (#23183)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-08-20 21:43:17 +00:00
youkaichao
1b125004be
[misc] fix multiple arch wheels for the nightly index (#23110)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-20 14:15:34 -07:00
rongfu.leng
4fbda0b20c
[Feature] use --eplb_config to set eplb param (#20562)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-20 14:07:28 -07:00
Russell Bryant
4e51fa8cba
Do not use eval() to convert unknown types (#23266)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-08-20 13:28:30 -07:00
Saurabh Misra
bf7c99dfc4
[Perf] Speed up function _convert_tokens_to_string_with_added_encoders by 13.7x (#20413)
Signed-off-by: Saurabh Misra <misra.saurabh1@gmail.com>
Signed-off-by: Aseem Saxena <aseem.bits@gmail.com>
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Co-authored-by: Aseem Saxena <aseem.bits@gmail.com>
2025-08-20 13:17:11 -07:00
Chen Zhang
b95697d731
[Frontend] improve error logging of chat completion (#22957)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-20 13:03:37 -07:00
bigmoyan
582bbe6bd7
[Fix] correct tool_id for kimi-k2 when use tool_choice=required (#21259)
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
2025-08-20 12:59:54 -07:00
Michael Goin
0cdbf5e61c
[Kernel/Quant] Remove the original marlin format and qqq (#23204)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-20 15:13:36 -04:00
dongluw
ebe56a0064
Small fix for Command-A-Vision (#23268)
Signed-off-by: donglu <donglu@cohere.com>
2025-08-20 18:15:18 +00:00
Russell Bryant
f77a0802b7
Limit HTTP header count and size (#23267)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
2025-08-20 17:57:37 +00:00
Benji Beck
c4477f55e5
Migrate Mistral3ImagePixelInputs to TensorSchema (#21945)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-08-20 17:37:29 +00:00
Yong Hoon Shin
dfd2382039
[torch.compile] Support conditional torch.compile per module (#22269)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-20 16:52:59 +00:00
JartX
3b11b26b50
[FIXBUG ] Allow disabling rocm_aiter_fa backend for ROCm GPUs not compatible with AITER (#22795)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-08-20 09:08:29 -07:00
Woosuk Kwon
d6d13bd49e
[Misc] Add max_seq_len to CommonAttentionMetadata (#23216)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-20 09:05:29 -07:00
Cyrus Leung
5efd6905bc
[CLI][Doc] Formalize --mm-encoder-tp-mode (#23190)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-20 23:42:28 +08:00
shixianc
b17109beea
[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045)
Signed-off-by: Shixian Cui <shixian@amazon.com>
2025-08-20 10:35:26 -04:00
Cyrus Leung
4449235843
[Bugfix] Ensure correctness of HCXVision processing (#23254)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-20 14:19:30 +00:00