jvlunteren
01a583fea4
[Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel ( #21197 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
2025-09-18 14:27:01 +00:00
Roger Wang
21da73343a
[Misc] Clean up flags in vllm bench serve ( #25138 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-18 12:43:33 +00:00
Asaf Joseph Gardin
66072b36db
[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support ( #24883 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
2025-09-18 12:21:17 +00:00
Chauncey
cc935fdd7e
[Frontend] Support setting logprobs to -1 ( #25031 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-18 10:34:42 +00:00
Aaron Pham
29283e8976
[Chore] Cleanup guided namespace, move to structured outputs config ( #22772 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-18 09:20:27 +00:00
Gerard Finol
aa3f105c59
Add 'path' option to ImagePrompt data_format ( #25081 )
...
Signed-off-by: Gerard Finol <gerard.finol@urv.cat>
2025-09-18 02:02:14 -07:00
Benjamin Chislett
b7433ca1a4
[Spec Decode] Efficient padded speculation ( #24539 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-09-18 01:07:24 -04:00
Woosuk Kwon
5c65a72bb1
[V0 Deprecation] Remove more V0 tests ( #25117 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 22:05:25 -07:00
Andrew Sansom
bec060fd99
Mark prompt logprobs as incompatible with prompt embeds at API level ( #25077 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-17 21:25:07 -07:00
Woosuk Kwon
7fb2a5be28
[V0 Deprecation] Skip PP test ( #25128 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 20:18:36 -07:00
Woosuk Kwon
6c036615dc
[V0 Deprecation] Remove misc V0 tests ( #25118 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 19:41:55 -07:00
Woosuk Kwon
2fc24e94f9
[V0 Deprecation] Remove V0 Tracing & Metrics tests ( #25115 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 19:40:44 -07:00
Woosuk Kwon
2c3c1bd07a
[V0 Deprecation] Remove V0 Engine tests ( #25114 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 19:38:09 -07:00
bnellnm
5963b98b46
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses ( #22537 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-17 17:43:31 -06:00
elvischenv
e6585ddb45
[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel ( #24833 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-17 16:37:23 -07:00
afeldman-nm
7ae9887542
[V1] Logits processor docs ( #22919 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Joseph Marinier <Joseph.Marinier@gmail.com>
2025-09-17 11:53:12 -07:00
Michael Goin
e3db5ebb66
[CI Bugfix] Fix failing test_model_load_with_params tests due to tokenizer refactor ( #25086 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-17 11:15:05 -07:00
Woosuk Kwon
9d442b7c48
[V0 Deprecation] Remove V0 tests in test_sequence.py ( #25088 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 11:08:45 -07:00
Woosuk Kwon
eb68c2dcd9
[CI] Revert back prepare_prompts and check_answers ( #25087 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 11:03:16 -07:00
Woosuk Kwon
4b946d693e
[V0 Deprecation] Remove V0 Core tests ( #25082 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 09:32:42 -07:00
Michael Goin
087c6ffc92
[CI Bugfix] Fix failing test_invalid_env ( #25078 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-17 08:28:58 -07:00
danielafrimi
252ada5559
Add RADIO Vision Encoder Support to vLLM ( #24595 )
...
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster>
2025-09-17 05:53:30 -07:00
Cyrus Leung
e120533d7a
[Misc] Avoid use of deprecated AutoModelForVision2Seq ( #25065 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-17 12:19:15 +00:00
Chauncey
544fe76b95
[Frontend] Support returning all prompt logprobs ( #24956 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-17 09:03:52 +00:00
Zhuohan Li
6c47f6bfa4
[Core] Remove tokenizer group in vLLM ( #24078 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-09-17 08:42:59 +00:00
haoyangli-amd
ca2d1925ef
[Rocm] [quantization] Fix quark ptpc moe and add test case ( #24649 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
Co-authored-by: Haoyang Li <haoyang.li@amd.com>
2025-09-16 22:15:13 -07:00
Roger Wang
0f7acdd73c
[Model] Support Qwen3-VL Model Series ( #24727 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Huang Jie <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: 松灵 <26085463+wulipc@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-17 05:01:04 +00:00
Woosuk Kwon
5801e49776
[V0 Deprecation] Remove MQLLMEngine ( #25019 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-16 21:29:27 -07:00
Nick Hill
eeb135eb87
[Core] Use CpuGpuBuffer for block table tensors ( #24795 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-16 19:18:06 -07:00
Tahsin Tunan
cef32104b4
[FP8] Extend per-token-group quantization support to QuantFP8 ( #24342 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-09-16 18:31:06 -07:00
Michael Goin
493b10f8bf
[CI] GPT-OSS GPQA eval test for Blackwell ( #24920 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-16 18:13:21 -07:00
Andrew Sansom
02d4b85454
Use kwargs for long lists of EngineCoreRequest arguments in tests and fix extra kwargs ( #24987 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-16 14:06:56 -07:00
Andrew Xia
86daa875fe
[gpt-oss][1][bugfix] fix streaming final output ( #24466 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 13:56:16 -06:00
Andrew Xia
f4d6eb95cf
[gpt-oss][1b] streaming add item id, content id ( #24788 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 18:41:12 +00:00
Sage Moore
567939953b
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM ( #23693 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 12:21:48 -04:00
Ming Yang
4e5affeaa1
[CI] Add Decode Context Parallelism (DCP) test to CI ( #24487 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-09-16 21:21:28 +08:00
Chen Bruce
7ea5c73ad7
[Feat][EPLB] A novel static EPLB placement strategy for MoE models. ( #23745 )
...
Signed-off-by: bruceszchen <bruceszchen@tencent.com>
Signed-off-by: Chen Bruce <bruceszchen@tencent.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Chen Bruce <cszwwdz@vip.qq.com>
Co-authored-by: lemon412 <lemon412@foxmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-16 10:55:16 +00:00
cascade
17871983a2
[Bugfix] Fix sequence parallelism bug when enable pipeline parallelism ( #24021 )
...
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-09-16 04:32:32 +00:00
Woosuk Kwon
759ef49b15
Remove V0 Encoder-Decoder Support ( #24907 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-15 21:17:14 -07:00
Mark McLoughlin
2942970d44
[Metrics] Hide deprecated metrics with gpu_ prefix ( #24245 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-09-15 20:15:57 -06:00
Wentao Ye
3c96e7b8a1
[CI] Small Accuracy Eval Test for Deepseek Model ( #24259 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-15 20:14:50 -06:00
Gregory Shtrasberg
2891603efd
[ROCm][Bugfix] Fix the case where there's bias ( #24895 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-15 20:05:12 -06:00
Jiangyun Zhu
5bcc153d7b
[Compile] Fix noop_elimination pass and add tests for noop_elimination ( #24880 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-15 23:33:18 +00:00
Mickaël Seznec
45bfa49cb8
[Tests] fix initialization of kv hash in tests ( #24273 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
2025-09-15 21:48:27 +00:00
Andrew Xia
25aba2b6a3
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse ( #24561 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-15 13:07:55 -07:00
Kyle Sayers
a0b26701c9
[Transform] Deterministic Hadacore Transforms ( #24106 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-09-15 12:59:31 -06:00
Harry Mellor
c4afdb69cc
Move MultiModalConfig from config/__init__.py to config/multimodal.py ( #24659 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-15 17:43:16 +00:00
Isotr0py
0e219cd50b
[Bugfix] Fix GLM4.1V multimodal processor with compatability for Transformers v4.56 ( #24822 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-15 20:45:06 +08:00
ant-yy
72c99f2a75
[Model]: support Ling2.0 ( #24627 )
...
Signed-off-by: vito.yy <vito.yy@antgroup.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-15 05:09:30 -07:00
Ning Xie
bc0f6059a2
[UT] enhance free kv cache block queue popleft_n ( #24220 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-09-15 10:04:37 +00:00