sarathc-cerebras
|
28d15ab56b
|
adds jais 2 support (#30188)
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-12-18 15:46:58 +00:00 |
|
Wentao Ye
|
6628758233
|
[Bug] Fix batch invariant in torch 2.10 (#30907)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 07:27:51 -08:00 |
|
zhrrr
|
eee600c34f
|
[Misc] support nsys profile for bench latency (#29776)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2025-12-18 14:52:20 +00:00 |
|
Michael Goin
|
100f93d2be
|
Filter safetensors files to download if .safetensors.index.json exists (#30537)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-18 14:51:17 +00:00 |
|
vllmellm
|
96bf50a2c0
|
[ROCm] Serving Fails on Radeon Due to AITER Dtype Import (#30952)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-12-18 11:47:46 +00:00 |
|
Li, Jiang
|
f90d3636e2
|
[Bugfix][CPU] Fix Mac CPU build (#30955)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-18 01:38:22 -08:00 |
|
Ming Yang
|
8372be2828
|
[moe] Use enable_chunking func (to support disabling chunking) (#29935)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-12-18 09:02:38 +00:00 |
|
Andreas Karatzas
|
8da6ae49c3
|
[ROCm][Bugfix] Fix fa_version argument error in flash_attn_maxseqlen_wrapper for ROCm without aiter (#30909)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-18 16:45:51 +08:00 |
|
Lucas Wilkinson
|
30bb19a760
|
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-17 23:50:15 -08:00 |
|
Chauncey
|
aa7e836055
|
[Bugfix] Fix Unicode issues in GLM-4 tool calling (#30920)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-18 07:12:17 +00:00 |
|
Andreas Karatzas
|
be2ad5f920
|
[ROCm][Bugfix] fix(structured_output): Skip guidance backend for schemas with patternProperties (#30730)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-18 07:04:57 +00:00 |
|
wangxiyuan
|
a85724bd6e
|
[Platform] Let EPD work with non-cuda platform (#30225)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-12-18 06:45:29 +00:00 |
|
Yifan Qiao
|
11a89cf95c
|
[Fix][FlexAttention] return max logical block index to handle reused blocks (#30915)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
|
2025-12-18 06:42:21 +00:00 |
|
Li, Jiang
|
e3ab93c896
|
[CPU] Refactor CPU fused MOE (#30531)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-18 14:36:49 +08:00 |
|
Nathan Price
|
fc2ae6d617
|
fix: add warmup for audio preprocessing (#30706)
Signed-off-by: Nathan Price <nathan@abridge.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 06:12:29 +00:00 |
|
Yihua Cheng
|
ec965569d9
|
[KV connector][LMCache] Only record the cuda event when there are request to store/load (#30814)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
|
2025-12-18 05:31:34 +00:00 |
|
Divakar Verma
|
82dc338ad6
|
[AMD][CI] fix lm eval ci arg (#30911)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-12-18 13:18:26 +08:00 |
|
Vadim Gimpelson
|
717ac33d9c
|
[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json (#29553)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-12-18 13:16:04 +08:00 |
|
Li, Jiang
|
cfb7e55515
|
[Doc][CPU] Update CPU doc (#30765)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-18 04:59:09 +00:00 |
|
zzhxxx
|
b166ef20e1
|
[refactor] Add prefix support to embed_tokens in DeepSeek MTP (#30788)
Signed-off-by: zzhx1 <zzh_201018@outlook.com>
|
2025-12-18 04:45:56 +00:00 |
|
Zhengxu Chen
|
5f2f3fba1d
|
[compile] Fix CI for test_gpt2_cache_hit (#30902)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-17 20:22:23 -08:00 |
|
Matthew Bonanni
|
4a8412f773
|
[UX] Reduce DeepGEMM warmup log output to single progress bar (#30903)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-17 20:21:51 -08:00 |
|
Bowen Bao
|
0c738b58bc
|
[Quantization] Support Quark int4-fp8 w4a8 for MoE (#30071)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
|
2025-12-18 04:20:42 +00:00 |
|
gnovack
|
5a3adf581e
|
fused_moe_lora PDL improvements (#30716)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-17 19:55:00 -08:00 |
|
Isotr0py
|
6fe5887652
|
[Chore] Remove v0 dead code for Qwen2.5-omni (#30883)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-17 19:54:39 -08:00 |
|
Nicolò Lucchesi
|
bc3700e0cd
|
[NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-18 11:53:30 +08:00 |
|
Micah Williamson
|
fd8afdf38d
|
[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-18 10:27:37 +08:00 |
|
SungMinCho
|
a0b782f9cc
|
[Metrics] Model FLOPs Utilization estimation (#30738)
Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-12-18 01:40:51 +00:00 |
|
Rafael Vasquez
|
ed2897f336
|
[CI][Feature] Adds auto-rebase PR rule (#30875)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
|
2025-12-18 00:46:44 +00:00 |
|
Isotr0py
|
74a1ac38b0
|
[v1] Add PrefixLM support to TritonAttention backend (#30386)
|
2025-12-17 16:05:24 -08:00 |
|
Nathan Price
|
05a83dc6ee
|
feat(api): Eager chat template warmup to eliminate first-request latency (#30700)
Signed-off-by: Nathan Price <nathan@abridge.com>
|
2025-12-18 00:01:29 +00:00 |
|
Varun Sundar Rabindranath
|
e3fc374a9a
|
[BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM (#30899)
|
2025-12-17 15:00:59 -08:00 |
|
Andrey Talman
|
e06d0bf0aa
|
2.9.1 PyTorch release update (#28495)
|
2025-12-17 12:20:22 -08:00 |
|
Xunzhuo
|
e3a0f21e6c
|
[docs]: add ecosystem projects sr in docs/governance (#30844)
Signed-off-by: bitliu <bitliu@tencent.com>
|
2025-12-17 18:45:56 +00:00 |
|
Matthew Bonanni
|
7eb6cb6c18
|
[Attention] Update tests to remove deprecated env vars (#30563)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-17 09:49:59 -08:00 |
|
Nicolò Lucchesi
|
9ca8cb38fd
|
[CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio (#30878)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-17 18:49:56 +01:00 |
|
Cyrus Leung
|
2497228ad4
|
[Chore] Factor out logic for requesting initial memory (#30868)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-17 07:32:17 -08:00 |
|
KimHyemin
|
196cdc3224
|
[Model] Gemma3: Support untied word embeddings (#30827)
Signed-off-by: www-spam <panmahm@naver.com>
|
2025-12-17 07:11:18 -08:00 |
|
高鑫崧
|
b7b6a60aca
|
Adapt the old parameter enable_thinking in chat_template_kwargs (#30852)
Signed-off-by: xinsong.gao <1418762819@qq.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-17 07:10:59 -08:00 |
|
rongfu.leng
|
9e67c4ce98
|
[Docs] fix function name (#30748)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-12-17 12:14:45 +00:00 |
|
Jialin Ouyang
|
6e9dbcc50e
|
[Fix] uniform decode batch check (#30747)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-12-17 19:58:43 +08:00 |
|
Hank_
|
6482e3895b
|
chores: adjust the attn register param order (#30688)
Signed-off-by: Hank <hcc.mayday@gmail.com>
|
2025-12-17 19:58:16 +08:00 |
|
Harry Mellor
|
fb980eb2fd
|
Fix lazy import (#30858)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-17 03:33:50 -08:00 |
|
baoqian426
|
84896fda22
|
[Bugfix] deepseek-V3.2 self.weights_proj has no bias (#30841)
Signed-off-by: baoqian <1354987947@qq.com>
Signed-off-by: baoqian426 <1354987947@qq.com>
|
2025-12-17 03:32:34 -08:00 |
|
Kevin H. Luu
|
4bf6c23668
|
[ci] Sync test areas yaml file with test-pipeline (#30862)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2025-12-17 02:30:56 -08:00 |
|
Chauncey
|
9ad5b21710
|
[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory (#30749)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-17 02:27:30 -08:00 |
|
Wentao Ye
|
f284d7bd0c
|
[Bug] Fix AttributeError: 'ColumnParallelLinear' object has no attribute weight_scale_inv (#30823)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-17 02:00:35 -08:00 |
|
Zhengxu Chen
|
53cd7f868b
|
[compile] Recompile graph module during Dynamo cache loading. (#30743)
Signed-off-by: Zhengxu Chen <zhxchen17@fb.com>
|
2025-12-17 02:00:12 -08:00 |
|
danielafrimi
|
7b966ae2ba
|
[Fix]Load kv-cache dtype from hf_quant_config.json automatically (fix for reverted PR) (#30785)
Signed-off-by: <>
Co-authored-by: root <root@gpu-937.slurm-workers-slurm.slurm.svc.cluster.local>
|
2025-12-17 01:56:38 -08:00 |
|
Zhengxu Chen
|
9db1db5949
|
[compile] Ignore VLLM_FORCE_AOT_LOAD from cache factors (#30809)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-17 01:56:24 -08:00 |
|