Nick Hill
b0b77c4655
[BugFix] Fix spec decode + structured outputs + preemption edge case ( #30916 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-18 12:59:55 -08:00
Kayvan Mivehnejad
634a14bd7d
Strengthen input validation and tests for 'parse_raw_prompts’. ( #30652 )
...
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
2025-12-18 19:51:58 +00:00
Chen Zhang
24b65eff0d
[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 ( #30319 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-12-18 19:47:56 +00:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-18 19:46:28 +00:00
Wentao Ye
97000a2be7
[Bug] Fix compressed tensor not using deepgemm ( #30820 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-18 14:45:55 -05:00
Isotr0py
d2dc5dfc6e
[Bugfix] Remove tile_size=64 for mm_prefix triton attention ( #30973 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-18 20:42:32 +01:00
navmarri14
b8c477c115
tuned fused configs for B300 ( #30629 )
2025-12-18 11:41:59 -08:00
jiahanc
53ad423f26
[Perf] enable flashinfer rotary_embedding custom ops in DeepSeek rotary ( #30729 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-12-18 14:31:18 -05:00
wz1qqx
889f8bb250
[BugFix]Reclaim resources to prevent memory leaks when use LMCacheMPConnector ( #30745 )
...
Signed-off-by: wz1qqx <ziqi.wang@novita.ai>
Co-authored-by: wz1qqx <ziqi.wang@novita.ai>
2025-12-18 19:09:51 +00:00
Fanli Lin
058926d48c
[XPU] allow custom workers (e.g. vllm-omni workers) to be used on XPU ( #30935 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-12-18 10:16:36 -08:00
Isotr0py
700a5ad6c6
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface ( #30684 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-19 02:04:19 +08:00
Alec
62be3670cb
[BugFix] Add sleep to fix tight loop and release GIL ( #29476 )
...
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-12-18 09:52:55 -08:00
inkcherry
500f26e6d3
[Bugfix] fix DP-aware routing in OpenAI API requests ( #29002 )
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-12-18 09:50:42 -08:00
Nick Hill
686cbaac64
[Cleanup] Remove unused ModelRunner V1 InputBatch.num_tokens field ( #30218 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-18 09:17:00 -08:00
Vasiliy Kuznetsov
f4ee2c3d90
fix fp8 online quantization streaming with tp > 1 ( #30900 )
...
Signed-off-by: vasiliy <vasiliy@fb.com>
2025-12-18 11:45:15 -05:00
Xin Yang
9a5e96523b
[LoRA] Set default MXFP4 LoRA backend to Marlin ( #30598 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-18 08:42:22 -08:00
wzyrrr
326e7c3105
[Doc] Add Sophgo TPU Support ( #30949 )
...
Co-authored-by: zhaoyang.wang <zhaoyang.wang@sophgo.com>
2025-12-18 16:29:33 +00:00
Lucas Kabela
0db5439ded
[Bugfix][torch2.10] Fix test_qwen2_5_vl_compilation with 2.10 RC ( #30822 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-18 08:23:31 -08:00
sarathc-cerebras
28d15ab56b
adds jais 2 support ( #30188 )
...
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-18 15:46:58 +00:00
Wentao Ye
6628758233
[Bug] Fix batch invariant in torch 2.10 ( #30907 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-18 07:27:51 -08:00
zhrrr
eee600c34f
[Misc] support nsys profile for bench latency ( #29776 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
2025-12-18 14:52:20 +00:00
Michael Goin
100f93d2be
Filter safetensors files to download if .safetensors.index.json exists ( #30537 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-18 14:51:17 +00:00
vllmellm
96bf50a2c0
[ROCm] Serving Fails on Radeon Due to AITER Dtype Import ( #30952 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-12-18 11:47:46 +00:00
Li, Jiang
f90d3636e2
[Bugfix][CPU] Fix Mac CPU build ( #30955 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-18 01:38:22 -08:00
Ming Yang
8372be2828
[moe] Use enable_chunking func (to support disabling chunking) ( #29935 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-12-18 09:02:38 +00:00
Andreas Karatzas
8da6ae49c3
[ROCm][Bugfix] Fix fa_version argument error in flash_attn_maxseqlen_wrapper for ROCm without aiter ( #30909 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-18 16:45:51 +08:00
Lucas Wilkinson
30bb19a760
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) ( #30910 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-17 23:50:15 -08:00
Chauncey
aa7e836055
[Bugfix] Fix Unicode issues in GLM-4 tool calling ( #30920 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-18 07:12:17 +00:00
Andreas Karatzas
be2ad5f920
[ROCm][Bugfix] fix(structured_output): Skip guidance backend for schemas with patternProperties ( #30730 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-18 07:04:57 +00:00
wangxiyuan
a85724bd6e
[Platform] Let EPD work with non-cuda platform ( #30225 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-18 06:45:29 +00:00
Yifan Qiao
11a89cf95c
[Fix][FlexAttention] return max logical block index to handle reused blocks ( #30915 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
2025-12-18 06:42:21 +00:00
Li, Jiang
e3ab93c896
[CPU] Refactor CPU fused MOE ( #30531 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-18 14:36:49 +08:00
Nathan Price
fc2ae6d617
fix: add warmup for audio preprocessing ( #30706 )
...
Signed-off-by: Nathan Price <nathan@abridge.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-18 06:12:29 +00:00
Yihua Cheng
ec965569d9
[KV connector][LMCache] Only record the cuda event when there are request to store/load ( #30814 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu>
2025-12-18 05:31:34 +00:00
Divakar Verma
82dc338ad6
[AMD][CI] fix lm eval ci arg ( #30911 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-18 13:18:26 +08:00
Vadim Gimpelson
717ac33d9c
[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json ( #29553 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-12-18 13:16:04 +08:00
Li, Jiang
cfb7e55515
[Doc][CPU] Update CPU doc ( #30765 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-18 04:59:09 +00:00
zzhxxx
b166ef20e1
[refactor] Add prefix support to embed_tokens in DeepSeek MTP ( #30788 )
...
Signed-off-by: zzhx1 <zzh_201018@outlook.com>
2025-12-18 04:45:56 +00:00
Zhengxu Chen
5f2f3fba1d
[compile] Fix CI for test_gpt2_cache_hit ( #30902 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-12-17 20:22:23 -08:00
Matthew Bonanni
4a8412f773
[UX] Reduce DeepGEMM warmup log output to single progress bar ( #30903 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-17 20:21:51 -08:00
Bowen Bao
0c738b58bc
[Quantization] Support Quark int4-fp8 w4a8 for MoE ( #30071 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com>
2025-12-18 04:20:42 +00:00
gnovack
5a3adf581e
fused_moe_lora PDL improvements ( #30716 )
...
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-17 19:55:00 -08:00
Isotr0py
6fe5887652
[Chore] Remove v0 dead code for Qwen2.5-omni ( #30883 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-17 19:54:39 -08:00
Nicolò Lucchesi
bc3700e0cd
[NIXL] Support P tensor-parallel-size > D tensor-parallel-size ( #27274 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-18 11:53:30 +08:00
Micah Williamson
fd8afdf38d
[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 ( #30811 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-18 10:27:37 +08:00
SungMinCho
a0b782f9cc
[Metrics] Model FLOPs Utilization estimation ( #30738 )
...
Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-12-18 01:40:51 +00:00
Rafael Vasquez
ed2897f336
[CI][Feature] Adds auto-rebase PR rule ( #30875 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
2025-12-18 00:46:44 +00:00
Isotr0py
74a1ac38b0
[v1] Add PrefixLM support to TritonAttention backend ( #30386 )
2025-12-17 16:05:24 -08:00
Nathan Price
05a83dc6ee
feat(api): Eager chat template warmup to eliminate first-request latency ( #30700 )
...
Signed-off-by: Nathan Price <nathan@abridge.com>
2025-12-18 00:01:29 +00:00
Varun Sundar Rabindranath
e3fc374a9a
[BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM ( #30899 )
2025-12-17 15:00:59 -08:00