ElizaWszola
994acec0cc
[Bugfix] Fix fusion for VL models ( #30244 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
2025-12-14 21:22:37 +08:00
Johannes F
060893654d
fix: Update json features supported by xGrammar ( #30390 )
...
Signed-off-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Signed-off-by: Johannes F <johannesflommersfeld@users.noreply.github.com>
Co-authored-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-14 02:16:06 -08:00
Matthias Gehre
e9add129ad
[Bugfix] awq_gemm: fix argument order swap ( #30364 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-14 18:15:37 +08:00
Lasha Koroshinadze
3a20450d31
Add AudioFlamingo3 model support ( #30539 )
...
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha Koroshinadze <26011196+lashahub@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-14 02:14:55 -08:00
Cyrus Leung
dcb31196da
[Chore] Remove redundant RequestPrompt ( #30612 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-14 09:22:37 +00:00
Laith Sakka
f569c654e1
enable unbacked with aot_compile ( #30462 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-14 08:14:06 +00:00
Kayvan Mivehnejad
29f7d97715
Improve parse_raw_prompt test cases for invalid input .v2 ( #30512 )
...
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
2025-12-14 11:18:41 +08:00
Cyrus Leung
39cefbdf17
[Refactor] TokenizerRegistry only uses lazy imports ( #30609 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 23:16:22 +08:00
Isotr0py
e5db3e2774
[CI/Build] Fix broken mm processor test Mistral-3-large ( #30597 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-13 04:43:01 -08:00
Cyrus Leung
64251f48df
[Chore] Adjust tokenizer import to avoid circular imports ( #30601 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 04:42:39 -08:00
Cyrus Leung
b09806e28f
[Bugfix] Dictionary MM embeddings for online chat ( #30507 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 15:48:56 +08:00
Roberto L. Castro
4fa7ce46f3
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM ( #30484 )
...
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-12-12 19:34:23 -08:00
Nicolò Lucchesi
57e9bf1864
[CI] Whisper logprobs tests ( #30504 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-13 10:49:11 +08:00
Michael Goin
2f32a68d75
[CI] Update several models in registry that are available online now ( #30514 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-12-12 18:28:13 -08:00
rasmith
08f8a5627e
[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality ( #30292 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-12 18:41:56 -05:00
shivampr
cd7740ac5c
[ROCm] Enable Triton ScaledMM fallback + kernel selection fix ( #26668 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com>
Signed-off-by: Shivam <shivamprasad91@gmail.com>
2025-12-12 13:28:20 -05:00
realliujiaxu
d2c919dcc2
[bugfix] fix bug when top_logprobs=0 with spec decoding ( #30059 )
...
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2025-12-12 09:03:35 -08:00
Benjamin Bartels
f3237f3f6b
[Frontend] Fixes anthropic streaming message_start usage nesting ( #30266 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-12 16:28:54 +00:00
jvlunteren
9c0ee995a8
[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel ( #28306 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-12-12 16:55:40 +01:00
Lucas Wilkinson
3e41992fec
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 ( #27532 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-12 05:57:47 -08:00
Jaehwang Jung
f90319d5d1
[Bugfix] Schedule failure due to wrong get_image_size_with_most_features ( #29692 )
2025-12-12 02:27:20 -08:00
rasmith
302b2c1eb9
[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. ( #30291 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-12 09:30:23 +00:00
Ben Browning
8f8fda261a
[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting ( #28729 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-12 12:59:53 +08:00
Andreas Karatzas
783644e4ac
[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available ( #30527 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-12 03:54:56 +00:00
Michael Goin
9f2fc16a69
[Bugfix][Model] Fix Afmoe rope_parameters issue ( #30505 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-12 02:53:57 +00:00
rasmith
ba80926681
[CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 ( #30508 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-12 01:02:19 +00:00
rasmith
48661d275f
[CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm ( #30417 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-12 00:24:20 +00:00
Andrew Briand
a00d88973d
[EPLB] Support EPLB w/ NVFP4 ( #29804 )
...
Signed-off-by: Andrew Briand <abriand@nvidia.com>
Co-authored-by: Andrew Briand <abriand@nvidia.com>
2025-12-11 22:59:40 +00:00
Harry Mellor
cf3eacfe58
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim ( #30389 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 20:45:23 +00:00
Harry Mellor
8781cd6b88
Add Eagle and Eagle3 support to Transformers modeling backend ( #30340 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 17:02:10 +00:00
Julien Denize
aa3c250c48
[IMPROVEMENT] Change MistralReasoningParser behavior ( #30391 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-12-11 17:53:26 +01:00
Shengqi Chen
305b168a9f
[CI] refine more logic when generating and using nightly wheels & indices, add cuda130 build for aarch64, specify correct manylinux version ( #30341 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-12 00:42:30 +08:00
Martin Hickey
f4417f8449
[KVConnector] Add KV events to KV Connectors ( #28309 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
2025-12-11 15:30:29 +01:00
Cyrus Leung
d917747c95
[Bugfix] Fix task still being passed in tests/benchmarks ( #30476 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-11 10:33:55 +00:00
jeremyteboul
4515eb1a0b
[Fix] Update lazing loading of video loader backend ( #30444 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-12-11 10:14:57 +00:00
Rei.
6299628d32
[bugfix] fix MiniMaxM2ReasoningParser streaming output not separating reasoning_content. ( #29882 )
...
Signed-off-by: Rei <1477174254@qq.com>
2025-12-11 09:05:08 +00:00
Ning Xie
d02d1043de
fix: enhance human_readable_int function ( #30337 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-12-10 23:30:33 -08:00
Wentao Ye
d6464f2679
[Chore] Fix torch precision warning ( #30428 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-11 04:05:56 +00:00
Cyrus Leung
7e24e5d4d6
[Deprecation] Remove deprecated task, seed and MM settings ( #30397 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:39 -08:00
Cyrus Leung
5a87d8b9b1
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release ( #30396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:35 -08:00
shivampr
8580919ac3
[Bugfix] fix confusing OOM errors during v1 init ( #28051 )
...
Signed-off-by: Shivam <shivamprasad91@gmail.com>
Signed-off-by: shivampr <shivampr.dev@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-12-10 23:17:41 +00:00
Jialin Ouyang
9f042ba26b
[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well ( #29289 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-12-10 14:13:01 -05:00
Will Eaton
a9e4106f28
[P/D] KV Load Failure Recovery/Abort Configuration ( #26813 )
...
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 11:00:52 -08:00
Nicolò Lucchesi
c756fb6781
[Core] Whisper enable FULL_DECODE_ONLY CudaGraph ( #30072 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-10 06:14:24 -08:00
Aditya Tewari
cebda2a4af
[CPU] Support for Whisper ( #30062 )
...
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
2025-12-10 04:58:42 -08:00
Fadi Arafeh
434ac76a7c
[cpu][ci] Add CPU Attention Tests for Neon Backend ( #30347 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-10 05:37:35 +00:00
Andreas Karatzas
ed7af3178a
[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group ( #29358 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-12-10 05:33:13 +00:00
Micah Williamson
7d80c73d42
[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance ( #30367 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-10 02:35:49 +00:00
rasmith
b75f826fca
[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform ( #30020 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-10 02:28:37 +00:00
Andrew Xia
c3487aca34
[responsesAPI][6] Fix multi turn MCP tokenization ( #30230 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-10 10:13:13 +08:00