Alexei-V-Ivanov-AMD
804e3468c0
Update AMD test definitions (2025-12-08) ( #30298 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-12-09 17:31:30 +00:00
Wentao Ye
83319b44c2
[Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled ( #29897 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-09 10:40:37 -05:00
Lucas Wilkinson
56037dfa2f
[BugFix] Fix assert batch_descriptor.num_tokens == num_tokens_padded ( #30173 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-09 10:36:12 -05:00
quanliu
5dcd593baf
[Feature] Batch-Invariant Support for FA2 and LoRA ( #30018 )
...
Signed-off-by: quanliu <18646313696@163.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-09 10:01:38 -05:00
Julien Denize
5c213d2899
[BUGFIX] Mistral tool call parser v11+ ( #30332 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2025-12-09 14:55:38 +00:00
vllmellm
ee14644ba9
[ROCm] Aiter Quant Kernels ( #25552 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-12-09 14:27:37 +00:00
Dongjie Zou
1166c31cc7
[Bugfix]: Fix glm46 awq marlin moe wna16 compatibility ( #30210 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
2025-12-09 12:20:21 +00:00
haoyangli-amd
03416eada6
[bugfix][quantization] Fix fp8 per_tensor scale shape ( #30257 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
2025-12-09 19:28:50 +08:00
Hubert de La Jonquiere
c72ea10723
[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. ( #30056 )
2025-12-09 18:54:08 +08:00
Jaya Yuan
67475a6e81
[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA ( #30309 )
...
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
2025-12-09 08:22:14 +00:00
wang.yuqi
9c32df6101
[Bugfix] Qwen 3 VL Embedding loading ( #30303 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-09 08:04:02 +00:00
Micah Williamson
aeb82b1930
[CI] Fix Flaky test_eagle_max_len Test ( #30306 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-09 07:33:34 +00:00
Lucas Wilkinson
aed846917f
[Attention] Make split_decodes_and_prefills(..., require_uniform=True) support padding ( #29644 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-12-09 07:24:01 +00:00
Yongtao Huang
e4605d225e
[Misc] Fix safetensors import for safe_open ( #30300 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
2025-12-09 06:50:06 +00:00
Tsukasa OI
58d5b3f514
[Model][Quantization] Restore MoE + GGUF models support (incl. Qwen3 MoE) by allowing Sideload Parameters ( #30116 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-09 05:30:05 +00:00
Fanli Lin
c2e1987a6e
[Doc] update Intel GPU MM status in Feature x Hardware matrix ( #30294 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2025-12-09 05:16:44 +00:00
Fadi Arafeh
e130845984
[CPU][CI] Enable fused MoE tests in Arm CI ( #30132 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-09 04:55:39 +00:00
liangel-02
4b03b50211
update torchao safetensors impl ( #30155 )
...
Signed-off-by: Angel Li <liangel@meta.com>
2025-12-09 12:46:35 +08:00
Or Ozeri
4c6fd25880
kv_transfer: Rename the shared storage connectors ( #30201 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-12-08 20:46:09 -08:00
Michael Goin
03b91f7262
[Bugfix] Fix compressed-tensors models failing to load with transformers backend ( #30287 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-08 20:44:28 -08:00
czhu-cohere
f6227c22ab
[Kernel]Support W4A8 Grouped GEMM on Hopper ( #29691 )
...
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-12-08 19:29:06 -08:00
gnovack
ea657f2078
Lora MoE Align Improvements ( #29257 )
...
Signed-off-by: gnovack <gnovack@amazon.com>
2025-12-09 10:35:16 +08:00
Kevin H. Luu
db14f61f2d
[ci] Refactor CI file structure ( #29343 )
2025-12-08 17:25:43 -09:00
Micah Williamson
78c7503364
[ROCm][CI] Skip NVIDIA-Only Prime-RL Test in AMD CI ( #29420 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-09 02:14:02 +00:00
Christina Norman
e41312a2f5
[Bugfix] Skip generation config fallback for GGUF to prevent multi-process hang ( #30209 )
...
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 01:52:43 +00:00
Yanan Cao
7b35011ad1
Mark qwen2_5_vl as xfail ( #30283 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-12-09 01:14:10 +00:00
Zhewen Li
ae339b1a67
[Bugfix] Fix DeepGEMM after #29546 ( #30267 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: Zhewen Li <zhewenli@meta.com>
2025-12-09 01:05:27 +00:00
Wentao Ye
0ee6416f67
[Perf] Optimize group_topk kernel, 1.9% Throughput improvement, 2.1% TPOT improvemnt ( #30159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-08 19:44:01 -05:00
Wentao Ye
d9417096d1
[Feature] Batch invariant: Enable TRITON_MLA without prefix-caching ( #29125 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-08 19:31:57 -05:00
Ming Yang
9d6235ca9a
[moe] Allow disabling DP chunking ( #29936 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-12-09 00:29:36 +00:00
Victor Ziliang Peng
f1599ca55d
feat(metrics): Add prefill KV compute metric excluding cached tokens ( #30189 )
...
Signed-off-by: Ziliang Peng <ziliang@character.ai>
2025-12-09 00:08:48 +00:00
Ming Yang
60d17251c9
[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP ( #28782 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-12-09 00:01:08 +00:00
Lain
1fb632fdb6
[Perf] Improve fp8 quant in mla; replace ReduceSum with ReduceScatterSum ( #29795 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
2025-12-08 15:02:34 -08:00
Charlie Fu
6af70e11a0
[ROCm][CI] Fix test_max_len.py for Rocm ( #29916 )
...
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
2025-12-08 16:58:30 -05:00
roikoren755
ae0f69b16a
Add SpecDec support to selective_state_update ( #29488 )
...
Signed-off-by: Roi Koren <roik@nvidia.com>
2025-12-08 16:45:18 -05:00
Dmitry Tokarev
799804d140
Bump nvshmem to 3.3.24 and fix CUDA 13 installation ( #30149 )
...
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-08 20:24:34 +00:00
Vasiliy Kuznetsov
0d402d2600
online fp8 quant with streaming weight post-processing ( #29196 )
...
Signed-off-by: vasiliy <vasiliy@fb.com>
2025-12-08 20:15:10 +00:00
Johnny Yang
d1b5e7afbf
[TPU] Bump tpu-inference to 0.12.0 ( #30221 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-12-08 20:10:10 +00:00
shaharmor98
fcd5306f65
Add latent MoE support ( #30203 )
...
Signed-off-by: Shahar Mor <smor@nvidia.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-08 17:35:01 +00:00
weiguihua2
398a596ed2
[MP executor] fix get device count for multi node of mp executor feature ( #30042 )
...
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-12-09 01:33:48 +08:00
Jee Jee Li
67312cad11
[Misc] Split the LoRA code ( #30253 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-09 00:59:31 +08:00
Laith Sakka
87aee9ed2b
Add evaluate_guards option to DynamicShapesConfig ( #27432 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-08 10:46:15 -05:00
Daniel Cámpora
184076c3fe
[DeepSeek v3.2] Make top-k work for any logit values. ( #27568 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-08 06:55:58 -08:00
Ye (Charlotte) Qi
eb1051fb95
[ROCm] Guard group quant RMS norm fusion patterns ( #30239 )
2025-12-08 14:44:48 +00:00
Jee Jee Li
80433e225e
[LoRA] Reduce the loading time of MoE LoRA ( #30243 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-08 13:29:47 +00:00
Harry Mellor
5c2433a6f3
Add tip for mypy and markdownlint to the pre-commit comment ( #30259 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-08 13:11:51 +00:00
Simon Mo
77072e93b3
[docs] governance documents ( #24801 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-08 12:06:20 +00:00
wang.yuqi
2e660c2434
[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. ( #30249 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-08 12:01:21 +00:00
Shiming Zhang
408cf42f67
[CI] Prevents triggering of an inactive issue/PR check for forked repository. ( #29654 )
...
Signed-off-by: Shiming Zhang <wzshiming@hotmail.com>
2025-12-08 10:29:14 +00:00
wang.yuqi
9e77ffca3f
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API ( #26686 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-08 08:10:09 +00:00