Angela Yi
|
612d5ffdab
|
[ci] Fix Pytorch compilation test oom in 2.10 (#31194)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-12-23 01:56:47 +00:00 |
|
Divakar Verma
|
78e5e62bbf
|
[AMD][CI] fix v1/engine test_preprocess_error_handling (#31192)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-12-23 01:28:19 +00:00 |
|
Robert Shaw
|
b57b967386
|
[MoE Refactor][7/N] AITER MK (#31102)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-22 16:42:58 -07:00 |
|
Michael Goin
|
6d518ffbaa
|
[CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests (#31182)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-22 15:40:35 -08:00 |
|
Benjamin Chislett
|
85aff45e24
|
[Perf] Remove blocking copy in GDN Attention (#31167)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-12-22 14:25:22 -08:00 |
|
Wentao Ye
|
5312a7284e
|
[Bug] Fix 'CutlassMLAImpl' object has no attribute '_workspace_buffer' (#31173)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-22 14:24:27 -08:00 |
|
Lucas Wilkinson
|
de71747655
|
[SpecDecode] Simplified alternative padded-speculation acceptance rate fix (#29845)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-22 13:06:10 -08:00 |
|
Michael Goin
|
9586354053
|
[Doc] Add vllm-metal to hardware plugin documentation (#31174)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-22 20:06:29 +00:00 |
|
Pavani Majety
|
b10f41c894
|
[SM100] Enable fp8 compute for prefill MLA (#30746)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-12-22 19:15:57 +00:00 |
|
Yongye Zhu
|
7b926e8901
|
[MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE (#31052)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-12-22 17:34:19 +00:00 |
|
Gregory Shtrasberg
|
ab3a85fd68
|
[ROCm][CI/Build] Fix triton version to one that has triton_kernels required for gpt-oss to run (#31159)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-12-22 17:19:27 +00:00 |
|
Boyuan Feng
|
8dd0db687b
|
[UX] improve profiler error message (#31125)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-12-22 08:45:59 -08:00 |
|
TJian
|
022f3cea53
|
[ROCm] [Critical]: Remove unused variable (#31156)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-12-22 08:28:22 -08:00 |
|
Micah Williamson
|
a5bc77c253
|
[AMD][CI] Add "V1 Test e2e + engine" to mi325_8 Agent Pool (#31040)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-22 10:41:56 -05:00 |
|
Nicolò Lucchesi
|
b1c3f96ae3
|
[CI][Bugfix] Fix entrypoints/openai/test_audio.py (#31151)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-22 07:21:40 -08:00 |
|
dengyunyang
|
8f8f469b1b
|
[BugFix] skip language model in Encoder (#30242)
Signed-off-by: dengyunyang <584797741@qq.com>
|
2025-12-22 05:25:59 -08:00 |
|
Shengqi Chen
|
2cf91c2ea4
|
[CI] add polling for precompiled wheel in python_only_compile.sh, fix index generation for releases (#30781)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-22 13:24:21 +00:00 |
|
AlonKejzman
|
bd6d5a7475
|
[gpt-oss] Fix harmony parser in streaming responses (#30205)
Signed-off-by: AlonKejzman <alonkeizman@gmail.com>
|
2025-12-22 20:56:06 +08:00 |
|
Li Wang
|
256a33ecb4
|
[Model] Fix bagel failed to run (#31132)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-12-22 02:15:54 -08:00 |
|
Roger Young
|
c02a2705f9
|
Update MiniMax-M2 ToolCall and add MiniMax-M2.1 in Docs (#31083)
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
|
2025-12-22 05:28:40 +00:00 |
|
Kevin McKay
|
cf8eed7bef
|
[Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled (#31109)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2025-12-21 21:14:58 -08:00 |
|
Kevin McKay
|
44ae85f725
|
[Misc] Fix typo: 'occured' -> 'occurred' (#31120)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:14:27 -08:00 |
|
Kevin McKay
|
14c3e6ade3
|
[Misc] Fix spelling typos in model comments (#31117)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:14:14 -08:00 |
|
Kevin McKay
|
42b42824ae
|
[Misc] Fix grammar errors in comments and messages (#31115)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:14:02 -08:00 |
|
Kevin McKay
|
ec58c10ce1
|
[Misc] Fix quantization-related typos (#31116)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:13:48 -08:00 |
|
Kevin McKay
|
8c084de59d
|
[Misc] Fix spelling typos in comments (#31114)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:13:14 -08:00 |
|
CedricHuang
|
19cc9468fd
|
[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM (#30957)
|
2025-12-21 22:34:49 -05:00 |
|
Jee Jee Li
|
097978a15d
|
[Kernel] Enable fused_qknorm_rope_kernel supports partial rope (#30821)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-21 18:39:22 -08:00 |
|
Lucas Wilkinson
|
7e065eba59
|
[CI] Fix "2 Node Tests (4 GPUs in total)" (#31090)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-22 10:32:40 +08:00 |
|
Steve Westerhouse
|
9d701e90d8
|
[Doc] Clarify FP8 KV cache computation workflow (#31071)
Signed-off-by: westers <steve.westerhouse@origami-analytics.com>
|
2025-12-22 08:41:37 +08:00 |
|
Michael Goin
|
06d490282f
|
[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size (#30897)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-21 09:41:57 -08:00 |
|
Robert Shaw
|
b471092d3a
|
[MoE Refactor][4/N] Marlin Fp8 Mk (#31036)
|
2025-12-21 12:37:42 -05:00 |
|
Ameen Patel
|
93cabc417c
|
ci: add nvidia-smi warmup before Prime-RL integration test (#31093)
Signed-off-by: AmeenP <ameenp360@gmail.com>
|
2025-12-21 15:43:01 +00:00 |
|
Chauncey
|
bb80f69bc9
|
add aarnphm and chaunceyjiang to the new tool_parser directory (#31088)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-21 03:24:34 +00:00 |
|
汪志鹏
|
3e92b2b7ac
|
[BugFix]fix gpt-oss v1/completions response bug (#30608)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: bbrowning <bbrownin@redhat.com>
|
2025-12-21 10:39:31 +08:00 |
|
Jinzhen Lin
|
7c73ceb581
|
[Quantization] add marlin w4a8/w8a8 check (#31061)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-20 21:58:11 +00:00 |
|
Lucas Wilkinson
|
ae0770fa6b
|
[CI] Fix H200 Distributed test (#31054)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-20 16:48:49 -05:00 |
|
Jinzhen Lin
|
ee52d9901d
|
[Quantization] support logical_widths for fp8 marlin (#30962)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-20 12:02:57 -08:00 |
|
baonudesifeizhai
|
54c8924384
|
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash (#28891)
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
|
2025-12-20 18:22:04 +00:00 |
|
Yan Ma
|
560ae9638c
|
[XPU] enable fp8 online streaming quantization (#30944)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-12-20 13:45:27 +00:00 |
|
Jeffrey Wang
|
1501a4070e
|
[Bugfix] Read truncate_prompt_tokens from pooling_params in AsyncLLM.encode() (#31013)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2025-12-20 10:29:31 +00:00 |
|
Lucas Wilkinson
|
ff2168bca3
|
[CI] FIx fixture 'siglip_attention_config' not found (#31053)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-20 03:46:15 +00:00 |
|
Gregory Shtrasberg
|
0be149524c
|
[ROCm][CI/Build] Update ROCm dockerfiles (#30991)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-12-20 03:19:12 +00:00 |
|
zejunchen-zejun
|
d52c5096d7
|
[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm (#30869)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-12-20 09:03:35 +08:00 |
|
Yuxuan Zhang
|
8a7a414374
|
GLM-4.7 Tool Parser and Doc Update (#30876)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-12-20 00:09:58 +00:00 |
|
Robert Shaw
|
95befecc18
|
[MoE Refactor][2/N] Use Modular Kernels for Fp8 (#30825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 23:36:38 +00:00 |
|
Wentao Ye
|
4cf9429897
|
[Bug] Fix error 'Dynamo failed to run FX node with fake tensors for Deepseek V3.2 (#31046)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-19 23:31:31 +00:00 |
|
Robert Shaw
|
83a317f650
|
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 13:09:54 -08:00 |
|
Lucas Wilkinson
|
5f6477d1d0
|
[BugFix] Fix TypeError: unhashable type: 'dict' when serving deepseek32 (#30924)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-19 16:07:54 -05:00 |
|
Wentao Ye
|
3bd8335bd0
|
[Refactor] Refactor for DeepGemmQuantScaleFMT using cache (#30898)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-19 13:50:39 -07:00 |
|