Qidong Su
|
4587063267
|
Patch DeepEP when building docker image with CUDA 13 (#29154)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
|
2025-11-22 23:25:13 +00:00 |
|
Wentao Ye
|
472fdee974
|
[Chore] Update batch invariant code owner (#29246)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-22 13:50:02 -08:00 |
|
Yizhou
|
df78aeef08
|
Refactor: Move CUDA graph dispatch logic earlier (#27382)
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-11-22 16:10:31 -05:00 |
|
Nick Hill
|
7df331c66b
|
[BugFix] Fix chunked prompt logprobs + preemption (#29071)
|
2025-11-22 16:07:18 -05:00 |
|
Benjamin Bartels
|
eb5352a770
|
[CI/build] Removes source compilation from runtime image (#26966)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-11-22 10:23:09 -08:00 |
|
Cyrus Leung
|
d1cf8214e5
|
[Bugfix] Use HF config fields as fallback when loading Mistral config (#29239)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-22 11:22:48 -07:00 |
|
Fadi Arafeh
|
730bd35378
|
[perf][cpu] Accelerate paged attention GEMMs (QK, PV) on Arm CPUs with NEON (#29193)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-11-22 09:04:36 -08:00 |
|
Federico
|
f55c76c2b3
|
chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning (#29240)
|
2025-11-22 08:42:48 -08:00 |
|
ZiTian Zhao
|
d84d8f4429
|
Fix EVS crash when using video_embeds inputs in Qwen2.5-VL (#29232)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-22 06:48:59 -08:00 |
|
Cyrus Leung
|
ae66818379
|
[Misc] Fix pre-commit (#29238)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-22 06:48:01 -08:00 |
|
Nick Hill
|
d44a63c6d6
|
[BugFix] Fix returned logprobs with spec decode + prefill chunking (#29216)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-22 22:41:25 +08:00 |
|
Nicolò Lucchesi
|
066209a045
|
[Attention] Refactor FA block_size limitations to hybrid models only (#29084)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-22 06:38:44 -08:00 |
|
Bram Wasti
|
5f7209a793
|
[tiny] Remove unsupported TRITON_MLA backend from batch invariance (#28832)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-22 21:00:50 +08:00 |
|
yihong
|
2d4978a57e
|
fix: clean up function never use in setup.py (#29061)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-11-22 05:00:04 -08:00 |
|
Nandan Vallamdasu
|
6965a392a4
|
Fix: Resolve circular import in model_loader/utils.py (#29189)
Signed-off-by: nandan2003 <nandan.vallamdasu@outlook.com>
Signed-off-by: Nandan Vallamdasu <nandan.vallamdasu@outlook.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-22 04:58:22 -08:00 |
|
Cyrus Leung
|
5a4802588e
|
[Misc] Further clean up chunked prefill and prefix caching init (#29186)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-22 19:34:15 +08:00 |
|
rasmith
|
8e22da1d7f
|
[CI/Build Don't add FLASHINFER backend in test_cpu_offloading.py (#29229)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-22 11:00:54 +00:00 |
|
rasmith
|
a4fdf2405c
|
[CI/Build] Skip tests that require libcudart in test_lmcache_integration.py (#29228)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-22 10:59:39 +00:00 |
|
Jane (Yuan) Xu
|
e6309acdba
|
Simplify from_blob usage in get_cuda_view_from_cpu_tensor (#29027)
Signed-off-by: Jane Xu <janeyx@meta.com>
|
2025-11-22 10:35:32 +00:00 |
|
jinghanhu
|
988ee66b0d
|
Handle triton kernel import exception (#29062)
|
2025-11-22 10:07:50 +00:00 |
|
Mads Kildegård
|
ea38474ac5
|
[Frontend][Responses API] Multi-turn (with type: "output_text") support for non-harmony requests (#29175)
Signed-off-by: Mads Kildegård <mkildegaard99@gmail.com>
|
2025-11-22 09:58:22 +00:00 |
|
Andrew Xia
|
742e9ff6b3
|
[responsesAPI] parse reasoning item input (#28248)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-22 15:42:11 +08:00 |
|
Woosuk Kwon
|
e9056056fb
|
[Model Runner V2] Limit cudagraph size to max decode batch size (#29221)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-21 20:21:35 -08:00 |
|
Jee Jee Li
|
1489902b53
|
[LoRA] Cleanup FusedMoEWithLoRA (#29187)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-22 04:01:30 +00:00 |
|
Yanan Cao
|
933f67ecd8
|
[Bugfix]Fix a conditional to not check zero value (#28754)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-21 19:59:07 -08:00 |
|
rasmith
|
fd65015a14
|
[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-21 20:34:33 -07:00 |
|
Yihua Cheng
|
77e1c035d0
|
[chore][LMCache connector] Remove useless logs from lmcache connector (#29069)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
|
2025-11-22 03:18:00 +00:00 |
|
rasmith
|
6f403501a0
|
[CI/Build][AMD] Enable Entrypoints Integration Test (Pooling) to run without error on ROCm (#29212)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-22 02:13:18 +00:00 |
|
FlintyLemming
|
052950e5b3
|
Add fused MoE config for H200 E160 N192 fp8 (#29182)
Signed-off-by: FlintyLemming <admin@flinty.moe>
|
2025-11-21 17:37:51 -08:00 |
|
qli88
|
1ef9c9e294
|
[CI/Build] Disable test_gptoss_tp.py in 'LoRA TP Test' group for ROCm platform (#29204)
Signed-off-by: qli88 <qiang.li2@amd.com>
|
2025-11-21 17:36:19 -08:00 |
|
Jie Luo
|
5c8f2adf50
|
[Bugfix] Fix block size in block_table with PCP (#29094)
Signed-off-by: Livinfly <luojie3m@gmail.com>
|
2025-11-22 01:34:28 +00:00 |
|
Ryan Rock
|
ed8e6843cc
|
[CI/Build] Add terratorch for AMD (#29205)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2025-11-21 17:31:22 -08:00 |
|
Lukas Geiger
|
d045e22dfe
|
[Model][Qwen3VL] Tune Triton w8a8 block fp8 kernel for L40s (#29217)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-21 17:30:55 -08:00 |
|
Wentao Ye
|
1d34eb11e0
|
[CI] Bug: Fix triton import issue (#29202)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-21 17:14:49 -08:00 |
|
Charlie Fu
|
9a3101b2ba
|
[Rocm][CI] Fix DeekSeek V2-Lite Accuracy CI (#29135)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-11-21 17:11:02 -08:00 |
|
Angela Yi
|
d5dbdbfcb2
|
[docs] Fix cudagraph mode config (#29170)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-11-21 17:10:27 -08:00 |
|
Lucas Wilkinson
|
30d6466238
|
[BugFix] Fix Eagle IndexError: list index out of range for even num_speculative_tokens (#29102)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-22 00:47:05 +00:00 |
|
Woosuk Kwon
|
e9af6ba62a
|
[Model Runner V2] Optimize Gumbel Sampling Kernel (#29210)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-21 15:52:28 -08:00 |
|
Mark McLoughlin
|
c6fa3895e9
|
[KV Connector] Fix async connector prefix cache metrics (#28585)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-11-21 17:45:00 -05:00 |
|
Varun Sundar Rabindranath
|
3137991f55
|
[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor (#29162)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-21 14:28:17 -08:00 |
|
Julien Denize
|
57430fc95c
|
Default model load/config/tokenizer to mistral format if relevant files exist (#28659)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-11-21 13:58:59 -08:00 |
|
Lucas Wilkinson
|
c68c7b403d
|
[BugFix] Fix missing symbol triggering FA2 fallback on Hopper (#29107)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-21 13:58:32 -08:00 |
|
Ning Xie
|
53a1ba6ec5
|
[log] add weights loading time log to sharded_state loader (#28628)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-21 21:06:09 +00:00 |
|
Lucas Wilkinson
|
1840c5cb18
|
[BugFix] Make sure to allocate worst case MoE workspace during profile run in the DP + EP case (#27426)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-21 11:41:52 -08:00 |
|
Woosuk Kwon
|
1bed891f72
|
[Chore] Fix pre-commit error after #25266 (#29190)
|
2025-11-21 10:21:40 -08:00 |
|
Cyrus Leung
|
ceca060501
|
[Deprecation] Deprecate seed=None (#29185)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-21 18:19:25 +00:00 |
|
Charlie Fu
|
75648b16dd
|
[ROCm][CI] Fix config/test_config_generation.py (#29142)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-11-21 17:12:16 +00:00 |
|
Chendi.Xue
|
460d02a417
|
[NIXL] Fix after virtual block_size for host_buffer with heter kv_layout (#29122)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-11-21 08:55:27 -08:00 |
|
Mingyuan Ma
|
b4c8fbaae2
|
Add TRTLLM MoE NVFP4 kernel to CompressedTensorsW4A4MoeMethod (#28892)
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-11-21 09:54:11 -07:00 |
|
rasmith
|
e99e467384
|
[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py (#29132)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-21 11:53:09 -05:00 |
|