Matthew Bonanni
7eb6cb6c18
[Attention] Update tests to remove deprecated env vars ( #30563 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-17 09:49:59 -08:00
Nicolò Lucchesi
9ca8cb38fd
[CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio ( #30878 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-17 18:49:56 +01:00
Jialin Ouyang
6e9dbcc50e
[Fix] uniform decode batch check ( #30747 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-12-17 19:58:43 +08:00
Chauncey
9ad5b21710
[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory ( #30749 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-17 02:27:30 -08:00
Michael Goin
519ef9a911
[UX] Make vllm bench serve discover model by default and use --input-len ( #30816 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-17 01:55:30 -08:00
Ye (Charlotte) Qi
a100152288
[Kernels][FI] Skip trtllm attention when num_kv_heads=1 ( #30842 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-17 01:54:21 -08:00
Xinyu Chen
3b1d440ede
CustomOp: grouped topk ( #29575 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
2025-12-17 17:43:00 +08:00
Robin
20fda43151
[Bugfix][Frontend] Prevent IndexError in MiniMax M2 tool parser during streaming extraction ( #30555 )
...
Signed-off-by: WangErXiao <863579016@qq.com>
2025-12-17 16:37:57 +08:00
Cyrus Leung
44d3b1df3d
[CI/Build] Fix compatibility between #30244 and #30396 ( #30787 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-16 20:21:19 -08:00
Wentao Ye
b6ec077e05
[CI] Skip ci failure test ( #30804 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-16 22:47:53 +00:00
Roger Wang
f5f51e5931
[Core][MM] Optimize encoder cache manager by operating with embeddings only ( #30475 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Sun Kim <sunytokki@gmail.com>
2025-12-16 14:18:17 -08:00
Lucas Wilkinson
9fec0e13d5
[Attention] Cache attention metadata builds across hybrid KV-cache groups ( #29627 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
2025-12-16 17:10:16 -05:00
Wentao Ye
f21f5ea38c
[Refactor] Small refactor for group topk ( #30562 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-16 14:50:59 -05:00
Nicolò Lucchesi
ca702a14dc
[Frontend] Add max-completion-token option to transcription/translation endpoints ( #30769 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-16 19:36:49 +00:00
Michael Goin
10ee1c64cf
[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test ( #30723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-16 14:28:34 -05:00
Harry Mellor
af506fd76a
Fix instantiation of HfHubHTTPError in LoRA test ( #30768 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-16 08:02:24 -08:00
Isotr0py
4de08ad698
[CI/Build] Skip broken ViT backend functionality test tempoarily ( #30782 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-16 06:45:25 -08:00
Jee Jee Li
0e391e7570
[Bugfix] Fix RequestOutput miss lora_request ( #30636 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-16 01:36:35 -08:00
Andrew Xia
0d0c929f23
[responsesAPI][8] input/output messages for ResponsesParser ( #30158 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-16 13:54:59 +08:00
jiangkuaixue123
b9ff4f2a8d
[feature] extend DBO to XBO ( #30120 )
...
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
2025-12-16 00:04:01 -05:00
Boyuan Feng
c881db364e
improve lazy import test ( #30733 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-16 03:12:05 +00:00
Shanshan Shen
3bd9c49158
[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic ( #29873 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-15 19:08:16 -08:00
penfree
bbd850e597
[Bugfix] fix streaming final output for non harmony ( #30237 )
...
Signed-off-by: penfree <qiupengfei@baidu.com>
Co-authored-by: penfree <qiupengfei@baidu.com>
2025-12-16 09:03:11 +08:00
Michael Goin
a450c64a30
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args ( #30708 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-15 20:18:02 +00:00
Cyrus Leung
ed586e7724
[Refactor] [3/N] Move tool parser tests and run on CPU ( #30693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-15 13:45:36 +00:00
Chauncey
2a1776b7ac
[Refactor] [2/N] Move tool parsers into the vLLM main directory ( #30675 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-15 12:54:52 +00:00
wang.yuqi
4429d934de
[Model] Automatic conversion of TokenClassification model ( #30666 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-15 08:13:00 +00:00
汪志鹏
1adeb3b84c
[New Model] BAGEL support (AR only) ( #28439 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-15 14:58:23 +08:00
Wentao Ye
3778673ea8
[Feat] Refactor for parallel_config in FusedMoEModularKernel ( #30282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-15 04:21:36 +00:00
Shanshan Shen
87b4d1557d
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. ( #30125 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-15 11:13:32 +08:00
Wenqi Glantz
84e23d103d
additional protection for CVE-2025-62164 ( #30649 )
...
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-15 03:07:10 +00:00
Or Ozeri
174e39ead7
CPU KV Offloading: Use more CUDA streams ( #29013 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-12-14 23:50:45 +00:00
Chendi.Xue
ae2e503dda
[NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 ( #30420 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-12-14 15:38:28 +00:00
ElizaWszola
994acec0cc
[Bugfix] Fix fusion for VL models ( #30244 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
2025-12-14 21:22:37 +08:00
Johannes F
060893654d
fix: Update json features supported by xGrammar ( #30390 )
...
Signed-off-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Signed-off-by: Johannes F <johannesflommersfeld@users.noreply.github.com>
Co-authored-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-14 02:16:06 -08:00
Matthias Gehre
e9add129ad
[Bugfix] awq_gemm: fix argument order swap ( #30364 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-14 18:15:37 +08:00
Lasha Koroshinadze
3a20450d31
Add AudioFlamingo3 model support ( #30539 )
...
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha Koroshinadze <26011196+lashahub@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-14 02:14:55 -08:00
Cyrus Leung
dcb31196da
[Chore] Remove redundant RequestPrompt ( #30612 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-14 09:22:37 +00:00
Laith Sakka
f569c654e1
enable unbacked with aot_compile ( #30462 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-14 08:14:06 +00:00
Kayvan Mivehnejad
29f7d97715
Improve parse_raw_prompt test cases for invalid input .v2 ( #30512 )
...
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
2025-12-14 11:18:41 +08:00
Cyrus Leung
39cefbdf17
[Refactor] TokenizerRegistry only uses lazy imports ( #30609 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 23:16:22 +08:00
Isotr0py
e5db3e2774
[CI/Build] Fix broken mm processor test Mistral-3-large ( #30597 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-13 04:43:01 -08:00
Cyrus Leung
64251f48df
[Chore] Adjust tokenizer import to avoid circular imports ( #30601 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 04:42:39 -08:00
Cyrus Leung
b09806e28f
[Bugfix] Dictionary MM embeddings for online chat ( #30507 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 15:48:56 +08:00
Roberto L. Castro
4fa7ce46f3
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM ( #30484 )
...
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-12-12 19:34:23 -08:00
Nicolò Lucchesi
57e9bf1864
[CI] Whisper logprobs tests ( #30504 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-13 10:49:11 +08:00
Michael Goin
2f32a68d75
[CI] Update several models in registry that are available online now ( #30514 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-12-12 18:28:13 -08:00
rasmith
08f8a5627e
[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality ( #30292 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-12 18:41:56 -05:00
shivampr
cd7740ac5c
[ROCm] Enable Triton ScaledMM fallback + kernel selection fix ( #26668 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com>
Signed-off-by: Shivam <shivamprasad91@gmail.com>
2025-12-12 13:28:20 -05:00
realliujiaxu
d2c919dcc2
[bugfix] fix bug when top_logprobs=0 with spec decoding ( #30059 )
...
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2025-12-12 09:03:35 -08:00