yifant-code
5ccf0efa84
[Bugfix] Improve error messages in ModelConfig validation ( #30213 )
...
Signed-off-by: ytian218 <ytian218@bloomberg.net>
Co-authored-by: ytian218 <ytian218@bloomberg.net>
2025-12-14 21:23:37 +08:00
ElizaWszola
994acec0cc
[Bugfix] Fix fusion for VL models ( #30244 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
2025-12-14 21:22:37 +08:00
zifeitong
48b8456ff9
[Bugfix] Revert Qwen2-VL part of change in #28271 ( #30542 )
...
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
2025-12-14 05:20:08 -08:00
Drew Botwinick
5b64ac21f9
[Bugfix] Update get_processor_data to use get_all method ( #30583 )
...
Signed-off-by: Drew Botwinick <6953152+dbotwinick@users.noreply.github.com>
2025-12-14 21:19:20 +08:00
Bin Bao
a8ec486592
[Misc] Add a script to benchmark compilation time ( #29919 )
...
Signed-off-by: Bin Bao <binbao@meta.com>
2025-12-14 13:02:39 +00:00
tjp_zju
6ecc1e411b
[Bugfix] fix _get_quant_method of FusedMoE for deepseekV3.2 on non-NV… ( #30057 )
...
Signed-off-by: tjp_zju <tanjianpingzju1990@gmail.com>
2025-12-14 02:20:51 -08:00
Shengliang Xu
0bb0bae436
Nvidia ModelOpt workaround for issue 28072 ( #30164 )
...
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
2025-12-14 18:18:31 +08:00
Johannes F
060893654d
fix: Update json features supported by xGrammar ( #30390 )
...
Signed-off-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Signed-off-by: Johannes F <johannesflommersfeld@users.noreply.github.com>
Co-authored-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-14 02:16:06 -08:00
Matthias Gehre
e9add129ad
[Bugfix] awq_gemm: fix argument order swap ( #30364 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-14 18:15:37 +08:00
Ilya Markov
3224ea9915
[torch.compile] Add encoder tag for compilation ( #30489 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2025-12-14 18:15:11 +08:00
Lasha Koroshinadze
3a20450d31
Add AudioFlamingo3 model support ( #30539 )
...
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha Koroshinadze <26011196+lashahub@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-14 02:14:55 -08:00
Didier Durand
1a55cfafcb
[Doc]: fixing typos in various files ( #30540 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-14 02:14:37 -08:00
drslark
add1b9d3de
[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring ( #30632 )
...
Signed-off-by: drslark <slarksblood@qq.com>
2025-12-14 01:32:16 -08:00
Cyrus Leung
dcb31196da
[Chore] Remove redundant RequestPrompt ( #30612 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-14 09:22:37 +00:00
Laith Sakka
f569c654e1
enable unbacked with aot_compile ( #30462 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-14 08:14:06 +00:00
Micah Williamson
97f2f160fd
[ROCm][CI] Add "Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test" Back Into AMD CI ( #30590 )
...
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-14 06:56:26 +00:00
Kayvan Mivehnejad
29f7d97715
Improve parse_raw_prompt test cases for invalid input .v2 ( #30512 )
...
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
2025-12-14 11:18:41 +08:00
Qier Li
dc7fb5bebe
[Bug][KVConnector][Metrics] Remove a vacuous assertion breaking external-launcher ( #30577 )
...
Co-authored-by: Qier Li <qier@fb.com>
2025-12-14 01:23:08 +00:00
Qidong Su
24429d5924
[Doc] Add instructions for building docker image on GB300 with CUDA13 ( #30414 )
...
Signed-off-by: Qidong Su <soodoshll@gmail.com>
2025-12-13 21:56:53 +00:00
Wentao Ye
6e78ed6ba7
[Logs] Optimize startup logs 4 ( #29903 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-13 16:12:53 -05:00
Isotr0py
7c16f3fbcc
[Doc] Add documents for multi-node distributed serving with MP backend ( #30509 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-13 18:02:29 +00:00
lif
ddbfbe5278
[Docs] Clarify Expert Parallel behavior for attention and MoE layers ( #30615 )
...
Signed-off-by: majiayu000 <1835304752@qq.com>
2025-12-13 08:37:59 -09:00
Laith Sakka
763963aa73
set assume_32bit_indexing and pass unbacked hints ( #30459 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-13 15:36:53 +00:00
Cyrus Leung
39cefbdf17
[Refactor] TokenizerRegistry only uses lazy imports ( #30609 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 23:16:22 +08:00
Chen Zhang
ace34e3783
[Bugfix] Qwen3-next with --hf-overrides \{\"num_hidden_layers\":8\} ( #30433 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-12-13 22:12:45 +08:00
Isotr0py
e5db3e2774
[CI/Build] Fix broken mm processor test Mistral-3-large ( #30597 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-13 04:43:01 -08:00
Cyrus Leung
64251f48df
[Chore] Adjust tokenizer import to avoid circular imports ( #30601 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 04:42:39 -08:00
Nick Hill
1cec5b7ea9
[Scheduer] Simplify stop checking for pooling models ( #30591 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-13 09:45:26 +00:00
Cyrus Leung
b09806e28f
[Bugfix] Dictionary MM embeddings for online chat ( #30507 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 15:48:56 +08:00
Tsukasa OI
fdc135d768
[Misc][Quantization] Clarify the intent of GGUF FusedMoE weight materialization ( #30310 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-12-13 13:55:14 +08:00
Roberto L. Castro
4fa7ce46f3
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM ( #30484 )
...
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-12-12 19:34:23 -08:00
Nicolò Lucchesi
57e9bf1864
[CI] Whisper logprobs tests ( #30504 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-13 10:49:11 +08:00
Michael Goin
2f32a68d75
[CI] Update several models in registry that are available online now ( #30514 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-12-12 18:28:13 -08:00
Matthew Bonanni
f5dfbbd8e9
[Docs] Remove references to VLLM_ATTENTION_BACKEND ( #30564 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-13 10:20:15 +08:00
Michael Goin
fc0119425c
Add IBM and Red Hat to compute resources sponsors ( #30581 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-12-13 01:34:23 +00:00
Matthew Bonanni
86a3261525
[Bugfix] Pass FA version in MultiHeadAttention ( #30575 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-13 00:02:11 +00:00
rasmith
08f8a5627e
[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality ( #30292 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-12 18:41:56 -05:00
Kevin H. Luu
b4039c08b5
[ci] Mark PrimeRL integration test as soft fail ( #30578 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-12-12 14:13:09 -08:00
Wentao Ye
1e6b115300
[Refactor] Reduce duplicate code in per_token_group_quant cuda kernels ( #30496 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-12 16:45:23 -05:00
danielafrimi
13618626df
[MoE-FP8-modelopt] Add FlashInfer alignment padding for intermediate dimensions ( #29748 )
...
Signed-off-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster>
Signed-off-by: dafrimi <dafrimi@nvidia.com>
Co-authored-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-12 20:42:32 +00:00
danielafrimi
6ec0d8dbe4
[Fix]Load kv-cache dtype from hf_quant_config.json automatically ( #29980 )
...
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>
2025-12-12 11:27:47 -08:00
Li, Jiang
9693dd0fe3
[CI/Build] Add x86 CPU wheel release pipeline ( #28848 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-12 19:21:35 +00:00
Xin Yang
1f19d8f899
[Perf] Set split_k to 1 for triton_kernels ( #30528 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com>
2025-12-12 14:07:57 -05:00
shivampr
cd7740ac5c
[ROCm] Enable Triton ScaledMM fallback + kernel selection fix ( #26668 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com>
Signed-off-by: Shivam <shivamprasad91@gmail.com>
2025-12-12 13:28:20 -05:00
Wentao Ye
02a5880394
[CI] Fix mypy for vllm/v1/executor ( #30517 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-12 18:05:34 +00:00
realliujiaxu
d2c919dcc2
[bugfix] fix bug when top_logprobs=0 with spec decoding ( #30059 )
...
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2025-12-12 09:03:35 -08:00
Benjamin Bartels
f3237f3f6b
[Frontend] Fixes anthropic streaming message_start usage nesting ( #30266 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-12 16:28:54 +00:00
jvlunteren
9c0ee995a8
[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel ( #28306 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-12-12 16:55:40 +01:00
Michael Goin
09ad3b76b3
[Bug] Fix attention_backend arg string parsing ( #30534 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-12 08:40:50 -07:00
Christina Norman
dc13c99eed
fix(gguf): Disable bfloat16 for GGUF on blackwell device ( #30408 )
...
Signed-off-by: Christina <truffle@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Christina Norman <christina@example.com>
Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-12 23:10:12 +08:00