danielafrimi
6ec0d8dbe4
[Fix]Load kv-cache dtype from hf_quant_config.json automatically ( #29980 )
...
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>
2025-12-12 11:27:47 -08:00
Li, Jiang
9693dd0fe3
[CI/Build] Add x86 CPU wheel release pipeline ( #28848 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-12 19:21:35 +00:00
Xin Yang
1f19d8f899
[Perf] Set split_k to 1 for triton_kernels ( #30528 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com>
2025-12-12 14:07:57 -05:00
shivampr
cd7740ac5c
[ROCm] Enable Triton ScaledMM fallback + kernel selection fix ( #26668 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com>
Signed-off-by: Shivam <shivamprasad91@gmail.com>
2025-12-12 13:28:20 -05:00
Wentao Ye
02a5880394
[CI] Fix mypy for vllm/v1/executor ( #30517 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-12 18:05:34 +00:00
realliujiaxu
d2c919dcc2
[bugfix] fix bug when top_logprobs=0 with spec decoding ( #30059 )
...
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2025-12-12 09:03:35 -08:00
Benjamin Bartels
f3237f3f6b
[Frontend] Fixes anthropic streaming message_start usage nesting ( #30266 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-12 16:28:54 +00:00
jvlunteren
9c0ee995a8
[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel ( #28306 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-12-12 16:55:40 +01:00
Michael Goin
09ad3b76b3
[Bug] Fix attention_backend arg string parsing ( #30534 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-12 08:40:50 -07:00
Christina Norman
dc13c99eed
fix(gguf): Disable bfloat16 for GGUF on blackwell device ( #30408 )
...
Signed-off-by: Christina <truffle@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Christina Norman <christina@example.com>
Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-12 23:10:12 +08:00
Vladislav Nosivskoy
3e34adcdfb
[DeepSeek V3.2] Proper drop_thinking logic ( #30490 )
...
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
2025-12-12 15:01:06 +00:00
Lucas Wilkinson
3e41992fec
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 ( #27532 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-12 05:57:47 -08:00
吴坎
91401c7a26
[Bugfix] Fix CMakeLists Environment Variable ( #21804 )
...
Signed-off-by: wu-kan <github@wu-kan.com>
Signed-off-by: 吴坎 <github@wu-kan.cn>
Signed-off-by: wu-kan <github@wu-kan.cn>
2025-12-12 10:54:52 +00:00
Jaehwang Jung
f90319d5d1
[Bugfix] Schedule failure due to wrong get_image_size_with_most_features ( #29692 )
2025-12-12 02:27:20 -08:00
rasmith
302b2c1eb9
[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. ( #30291 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-12 09:30:23 +00:00
Ben Browning
8f8fda261a
[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting ( #28729 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-12 12:59:53 +08:00
Zhengxu Chen
fe1787107e
[compile] Parse compile range cache keys as Range during cache loading. ( #30516 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-12-12 04:30:51 +00:00
Andreas Karatzas
783644e4ac
[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available ( #30527 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-12 03:54:56 +00:00
Ryan Rock
197473c4e7
[CI/Build] Use spawn subprocess for ROCm ( #30272 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2025-12-12 03:33:17 +00:00
Nick Hill
947dfda9c2
[LMCache] Relax lmcache version requirement ( #30425 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-11 18:18:47 -09:00
Michael Goin
9f2fc16a69
[Bugfix][Model] Fix Afmoe rope_parameters issue ( #30505 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-12 02:53:57 +00:00
Bhanu Prakash Voutharoja
6a6fc41c79
gptq marlin quantization support for fused moe with lora ( #30254 )
...
Signed-off-by: Bhanu068 <voutharoja.bhanu06@gmail.com>
2025-12-12 02:27:22 +00:00
Fadi Arafeh
f355ad5412
[CPU][FIX] Fix build failures on Arm CPUs with torch nightly ( #30481 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-12 02:09:25 +00:00
Lucas Wilkinson
042da73244
[Core] Refactor _build_attention_metadata ( #29628 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-11 17:54:12 -08:00
Andreas Karatzas
b5945d49c0
[ROCm][CI] Use mi325_4 agent pool for V1 e2e tests ( #30526 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-12 01:37:24 +00:00
rasmith
ba80926681
[CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 ( #30508 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-12 01:02:19 +00:00
jiahanc
0ab23c2b2b
[fix] fix SM check for Flashinfer TRTLLM MOE ( #30314 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-12-12 01:00:58 +00:00
rasmith
48661d275f
[CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm ( #30417 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-12 00:24:20 +00:00
Ev Lacey
d527cf0b3d
[FIX]Patch run-cluster.sh (fix for #28328 ) ( #30002 )
...
Signed-off-by: elacey <elacey@nvidia.com>
Signed-off-by: Ev Lacey <github@everettlacey.com>
2025-12-11 23:36:31 +00:00
Concurrensee
2cc5affc38
[ROCM][CI] Fix AMD Examples Test Group ( #30276 )
...
Signed-off-by: Yida Wu <yida.wu@amd.com>
Signed-off-by: Yida <yida.wu@amd.com>
2025-12-11 18:03:54 -05:00
Andrew Briand
a00d88973d
[EPLB] Support EPLB w/ NVFP4 ( #29804 )
...
Signed-off-by: Andrew Briand <abriand@nvidia.com>
Co-authored-by: Andrew Briand <abriand@nvidia.com>
2025-12-11 22:59:40 +00:00
Wentao Ye
61249b177d
[Refactor] Remove useless syncwarp ( #30510 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-11 17:43:41 -05:00
Wentao Ye
c817b14151
[Perf] Optimize deepgemm experts initialization, 3.9% TTFT improvement ( #30494 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: li-jinpeng <3332126450@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-12-11 17:28:34 -05:00
ioana ghiban
3efdc3feae
[Docs][CPU backend] Add pre-built Arm CPU Docker images ( #30491 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-11 22:03:29 +00:00
Nicolò Lucchesi
0efd9f867c
[Core] Whisper Enable Encoder Batching ( #29421 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-11 21:06:51 +00:00
Xingyu Liu
90d6cf921f
[BugFix][MM]support VLLM_RANDOMIZE_DP_DUMMY_INPUTS ( #30472 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-11 21:00:15 +00:00
Harry Mellor
cf3eacfe58
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim ( #30389 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 20:45:23 +00:00
Zhengxu Chen
92fea56fd1
[compile] Stop one-off setting enable_aot_compile and use context manager instead. ( #30503 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-12-11 20:28:03 +00:00
Ye (Charlotte) Qi
e458270a95
[Misc] Add mcp to requirements ( #30474 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-11 20:06:09 +00:00
Andreas Karatzas
72aaac5b66
[ROCm][Bugfix] Add MLACommonMetadata to allowed attention types for speculative decoding ( #30430 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-11 19:25:01 +00:00
汪志鹏
0e71eaa644
[Feature] AWQ marlin quantization support for fused moe with lora ( #30442 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
2025-12-11 18:03:32 +00:00
Harry Mellor
8781cd6b88
Add Eagle and Eagle3 support to Transformers modeling backend ( #30340 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 17:02:10 +00:00
Julien Denize
aa3c250c48
[IMPROVEMENT] Change MistralReasoningParser behavior ( #30391 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-12-11 17:53:26 +01:00
Shengqi Chen
305b168a9f
[CI] refine more logic when generating and using nightly wheels & indices, add cuda130 build for aarch64, specify correct manylinux version ( #30341 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-12 00:42:30 +08:00
Harry Mellor
93db3256a4
Give pooling examples better names ( #30488 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 16:22:58 +00:00
ioana ghiban
17cb540248
[Docs][CPU Backend] Add nightly and per revision pre-built Arm CPU wheels ( #30402 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 15:57:10 +00:00
Harry Mellor
97a042f3bc
Make the httpx logger less annoying when Transformers v5 is installed ( #30480 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 15:44:56 +00:00
Cyrus Leung
3a3b06ee70
[Misc] Improve error message for is_multimodal ( #30483 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-11 06:39:51 -08:00
Martin Hickey
f4417f8449
[KVConnector] Add KV events to KV Connectors ( #28309 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
2025-12-11 15:30:29 +01:00
Qiu
a11f4a81e0
[Misc][PCP&DCP] relocate PCP feature check ( #30050 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-11 03:36:18 -08:00