Yong Hoon Shin
|
11ac9ddd03
|
Support all interleaved layer types (#28485)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-11-13 08:57:20 +00:00 |
|
Chauncey
|
5c9ad138d5
|
[Frontend] supports interleaved thinking (#28531)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-13 16:14:13 +08:00 |
|
Jiangyun Zhu
|
fa183e9271
|
[Bugfix] fix kimi-linear crash (#28445)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-13 07:59:58 +00:00 |
|
usberkeley
|
4ab34f6ef1
|
Add NUMA node validation for CPU thread binding (#28555)
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
|
2025-11-13 07:03:52 +00:00 |
|
Huy Do
|
c33b87e777
|
Use official xformers-0.0.33 built for PT 2.9 (#28600)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-12 22:48:53 -08:00 |
|
tjandy98
|
4504e8029b
|
[Bugfix] Prevent crash on empty grammar string (#28210)
Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com>
|
2025-11-13 06:42:29 +00:00 |
|
Pleaplusone
|
ca00b1bfc6
|
[ROCm][BugFix] Remove the usage of device_info from aiter (#28383)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-12 21:43:42 -08:00 |
|
Radu Salavat
|
d44fbbab0e
|
[build][cmake]: Bundle static ACL and torch libgomp for CPU extension builds (#28059)
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
|
2025-11-13 05:43:08 +00:00 |
|
Lucia Fang
|
7e082bc14e
|
Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 (#28574)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-11-12 21:40:45 -08:00 |
|
Fanli Lin
|
dbbe0c756a
|
[XPU] Support Triton path for LoRA operations on XPU (#28511)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
|
2025-11-13 05:31:42 +00:00 |
|
Pleaplusone
|
7dca0c90cb
|
[BugFix][ROCm] Fix get_cu_count missing variable error (#28608)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-13 05:18:56 +00:00 |
|
Andrew Xia
|
1a0b157a2e
|
[Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format (#28231)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-11-13 04:47:22 +00:00 |
|
Andrew Xia
|
7c38ed0f1c
|
[Frontend] split append tool output (#28333)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-11-13 04:03:23 +00:00 |
|
Jialin Ouyang
|
a1d3866dda
|
[n-gen] DO NOT repeatedly return finished child requests (#28591)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-13 03:36:07 +00:00 |
|
Harry Mellor
|
97d1c99302
|
Rename clashing method names for vLLM model protocol (#27583)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 19:14:33 -08:00 |
|
Harry Mellor
|
3226283461
|
[Docs] Add some details about what the MoE block needs for the Transformers backend (#28588)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-13 03:12:14 +00:00 |
|
Nick Hill
|
8832fff972
|
[BugFix] Fix mm_encoder_attn_backend arg type checking (#28599)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-13 03:06:03 +00:00 |
|
Michael Goin
|
a543e678b4
|
[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support (#28561)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-12 19:40:59 -07:00 |
|
wangxiyuan
|
2dacd57394
|
[platform] Move get_cu_count to utils (#27005)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-13 08:48:47 +08:00 |
|
Gregory Shtrasberg
|
d75ad04818
|
[ROCm][Bugfix] Revert removing setuptools version restriction (#28592)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-11-12 16:46:58 -08:00 |
|
Michael Goin
|
52eadcec9e
|
[Docs] Update meetups.md description (#28583)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-13 00:00:23 +00:00 |
|
Harry Mellor
|
51c599f0ec
|
Skip models that cannot currently init on Transformers v5 (#28471)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 23:43:57 +00:00 |
|
Alexander Matveev
|
69d0e90313
|
[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-11-12 23:37:24 +00:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
4ca5cd5740
|
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
|
2025-11-12 15:24:12 -08:00 |
|
Michael Goin
|
10f01d5a3a
|
[Bugfix] Adjust Marlin CUDA arch selection to 8.0+PTX;9.0+PTX (#28294)
|
2025-11-12 15:14:13 -08:00 |
|
QiliangCui
|
3eb0c2673e
|
[TPU] Support GCS path in VLLM_TORCH_PROFILER_DIR (#28487)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-11-12 22:31:14 +00:00 |
|
vllmellm
|
d8140b9833
|
[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel naming for consistency in _aiter_ops.py (#28464)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-12 21:46:57 +00:00 |
|
Varun Sundar Rabindranath
|
74a9a9faad
|
[Performance][B200] Fix deepgemm prologue (#27897)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-12 13:13:03 -08:00 |
|
Wei Wei
|
478ee511de
|
[Misc]Fix typo in llm_engine.py (#28584)
Signed-off-by: Wei Wei <wwei6@meta.com>
|
2025-11-12 12:59:43 -08:00 |
|
Andy Lo
|
58ce8d12b7
|
[BugFix] Priority scheduling and spec tokens preemption (#28558)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2025-11-12 20:29:21 +00:00 |
|
Yihua Cheng
|
94a9ebcf31
|
[KV connector][WIP] KV cache proxy based on LMCache multi-process mode (#27902)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
|
2025-11-12 20:25:43 +00:00 |
|
Harry Mellor
|
a39dd7bb06
|
[CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken in current Transformers (#28559)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 19:38:13 +00:00 |
|
Thomas Parnell
|
64d57c3be7
|
[Model] [Config] Correctly identify granite-4.0-micro as non-hybrid model (#28563)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-11-12 18:17:55 +00:00 |
|
PerryZhang01
|
a1e7fa362a
|
[EPLB][ROCm]: support EPBL for ROCm backend (#27731)
Signed-off-by: Perry Zhang <perzhang@amd.com>
Co-authored-by: Perry Zhang <perzhang@amd.com>
|
2025-11-12 18:16:35 +00:00 |
|
alberto
|
bac904565f
|
Implement ARC KV cache eviction policy for CPU offloader (#27039)
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: alberto <aperdomo@redhat.com>
Co-authored-by: Or Ozeri <or@ozery.com>
|
2025-11-12 09:51:39 -08:00 |
|
Benjamin Chislett
|
304419576a
|
[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer (#28479)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-11-13 01:56:40 +09:00 |
|
Harry Mellor
|
a742134cc5
|
Remove deprecated fields from CompilationConfig (#27593)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 16:10:28 +00:00 |
|
Nicolò Lucchesi
|
728a9eb70e
|
[Misc] Refactor Attention kv transfer methods into decorator (#27816)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-12 16:05:44 +00:00 |
|
Canlin Guo
|
bc5bd45c7d
|
[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL (#28271)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-12 15:56:47 +00:00 |
|
Alexander Matveev
|
f76e85c299
|
[Performance][Hopper] Avoid M dim padding to 4x for most cases (due to cuda graphs paddings) (#28492)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-11-12 10:51:43 -05:00 |
|
Harry Mellor
|
54aecd9ed5
|
Fix pre-commit (and XPU) on main (#28556)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 06:13:41 -08:00 |
|
wangxiyuan
|
10138c92a5
|
[V0 deprecation] Deprecate use_v1 parameter (#28112)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-12 14:03:52 +00:00 |
|
Jee Jee Li
|
a9d18b5107
|
[Bugfix] Fix gpt_oss packed_modules_mapping (#28536)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-12 21:02:06 +08:00 |
|
TJian
|
edb59a9470
|
[ROCm] [Bugfix] Fix fused_qknorm_rope_kernel rocm compatibility (#28500)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-11-12 05:01:14 -08:00 |
|
ZhengHongming888
|
c5f10cc139
|
add cpu option for p/d in nixl_connector (#28356)
Signed-off-by: Hongming Zheng <hongming.zheng@intel.com>
|
2025-11-12 11:53:08 +00:00 |
|
ziruiliu
|
d143152308
|
[KVConnector] Enable get_block_ids_with_load_errors() in LMCache connector (#27978)
Signed-off-by: Zirui Liu <ziliu@ddn.com>
Signed-off-by: ziruiliu <ziliu@ddn.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-11-12 11:44:58 +01:00 |
|
Chaojun Zhang
|
a4730c1b4f
|
[XPU]Fix crash due to removed VLLM_USE_V1 attribute (#28520)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
|
2025-11-12 10:20:55 +00:00 |
|
wuyaoxuehun
|
d3ade61e42
|
[Model] fix glm4_moe_mtp load weights with GLM-4.6 checkpoint. (#27597)
Signed-off-by: wuao.scotty <wuao.scotty@bytedance.com>
Co-authored-by: wuao.scotty <wuao.scotty@bytedance.com>
|
2025-11-12 10:14:00 +00:00 |
|
yyzxw
|
1761dea1a8
|
[BugFix]: --enable-lora with model granite-4.0-micro crash (#27733)
Signed-off-by: zxw <1020938856@qq.com>
|
2025-11-12 09:03:56 +00:00 |
|
Huamin Li
|
c748355e0d
|
[CI] Introduce autorun_on_main feature (#27836)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-12 08:51:19 +00:00 |
|