12302 Commits

Author SHA1 Message Date
jiangkuaixue123
93c656e09b
Merge pull request #18 from jiangkuaixue123/afd-step3-merge
[Feature] adapt step3 model with AFD
2025-12-23 00:05:19 +08:00
i-yuanyukun
60d65cdf5c [Chore] remove unused method 2025-12-22 15:37:07 +08:00
i-yuanyukun
27ae2e761c [Chore] clean up debug info 2025-12-22 15:25:28 +08:00
i-yuanyukun
2a98ab3c8e [Chore]: step3 forward_with_afd 2025-12-22 14:29:03 +08:00
i-yuanyukun
6d305dda38 [Chore] add p2p connector debug log info 2025-12-19 16:11:47 +08:00
i-yuanyukun
bde36017fa [Chore] adjust log info 2025-12-19 16:04:47 +08:00
i-yuanyukun
65ea10c8f4 [Chore] bring back deleted code 2025-12-19 16:03:47 +08:00
i-yuanyukun
11d7d5bf59 [Chore] some log info 2025-12-19 16:02:07 +08:00
i-yuanyukun
6a8d35a9b6 [Chore] remove p2p connector duplicate code 2025-12-18 17:32:42 +08:00
i-yuanyukun
8276320a8a [Bugfix] compute ffn output param order 2025-12-18 17:03:15 +08:00
i-yuanyukun
26ddfa299c [Chore] remove duplicate code 2025-12-18 17:02:39 +08:00
i-yuanyukun
f74bb82909 [Chore] code lint 2025-12-18 15:56:43 +08:00
i-yuanyukun
cd16bcff1e [Chore] resolve some bugs due to merge 2025-12-18 15:56:20 +08:00
i-yuanyukun
d306d01dd7 [Feat] adapt step3 text model 2025-12-18 14:30:55 +08:00
jiangkuaixue123
36f9c3d6b5 add log
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
2025-12-16 15:49:36 +08:00
jiangkuaixue123
00570c9fac ffn dp use all2all
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
2025-12-16 15:49:36 +08:00
jiangkuaixue123
eb2355c600 ffn server use vllm serve and dp
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
2025-12-16 15:49:36 +08:00
jiangkuaixue123
28cba040c7 afd use ubatch without thread
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
2025-12-16 15:49:36 +08:00
jiangkuaixue123
bd8fe276f5 1.add afd
2.support afd with DBO.
3.support AFDP2PConnector
4.support afd with deepseekv2

Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
2025-12-16 15:49:35 +08:00
Andrew Xia
0d0c929f23
[responsesAPI][8] input/output messages for ResponsesParser (#30158)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-16 13:54:59 +08:00
Isotr0py
e94384bbad
[Bugfix] Fix broken ViT attention selection for Blackwell device (#30731)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-16 05:24:32 +00:00
jiangkuaixue123
b9ff4f2a8d
[feature] extend DBO to XBO (#30120)
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
2025-12-16 00:04:01 -05:00
Boyuan Feng
c881db364e
improve lazy import test (#30733)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-16 03:12:05 +00:00
Shanshan Shen
3bd9c49158
[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic (#29873)
Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-15 19:08:16 -08:00
Amr Mahdi
ff21a0fc85
[docker] Restructure Dockerfile for more efficient and cache-friendly builds (#30626)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2025-12-15 18:52:19 -08:00
penfree
bbd850e597
[Bugfix] fix streaming final output for non harmony (#30237)
Signed-off-by: penfree <qiupengfei@baidu.com>
Co-authored-by: penfree <qiupengfei@baidu.com>
2025-12-16 09:03:11 +08:00
Shengqi Chen
511e81e7c9
[BUILD] use sm_100f when compiling flashmla to fix support on sm103 (#30705)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-15 14:48:01 -08:00
Matthew Bonanni
a182be4308
[UX][Attention] Add attention_config argument to LLM() (#30710)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-15 17:29:09 -05:00
Kevin Musgrave
c01d589813
[Benchmarks] auto_tune.sh: Use hostname variable for server requests (#30529)
Signed-off-by: Kevin Musgrave <kevin.musgrave@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 22:00:29 +00:00
Matthew Bonanni
60dbf7d8f1
Update batch invariant to use attention config (#30704)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 15:24:16 -05:00
Michael Goin
a450c64a30
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args (#30708)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-15 20:18:02 +00:00
Fadi Arafeh
b2191abdca
[docs][fix] Update Arm CPU vLLM wheel installation docs (#30594)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-15 19:46:25 +00:00
Matthew Bonanni
51e5b3e3c4
[Bugfix] Fix ViT with FlashAttention on ROCm (#30703)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-15 19:45:21 +00:00
Isotr0py
ec154c36ee
[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform (#30212)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 17:36:07 +00:00
Harry Mellor
970713d4a4
Remove SkipValidation from ModelConfig (#30695)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-15 17:34:08 +00:00
mondaylord
17fec3af09
[Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition (#30671)
Signed-off-by: mondaylord <20212010046@fudan.edu.cn>
2025-12-15 16:13:37 +00:00
yjc9696
855b101d75
[Frontend] add tools for dsv32 developer role (#30040)
Signed-off-by: pridejcyang <pridejcyang@tencent.com>
Co-authored-by: pridejcyang <pridejcyang@tencent.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-15 15:08:47 +00:00
Robert Shaw
d0502b4928
[MoE][Refactor 1/N] Separate Online Quantization (#30627)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-15 06:54:53 -08:00
Max Hu
3f175f18a2
[Bugfix] Fix multimodal configuration for Qwen3VL MOE model (#30670)
Signed-off-by: Max Hu <hyoung2991@gmail.com>
2025-12-15 14:06:01 +00:00
Cyrus Leung
ed586e7724
[Refactor] [3/N] Move tool parser tests and run on CPU (#30693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-15 13:45:36 +00:00
Chauncey
2a1776b7ac
[Refactor] [2/N] Move tool parsers into the vLLM main directory (#30675)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-15 12:54:52 +00:00
Nicolò Lucchesi
185c22bf2f
[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector (#29805)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-15 11:17:58 +00:00
duke
e4806d973a
[BugFix] Add embed_input_ids method to make QWenLMHeadModel a vllm model (#30674)
Signed-off-by: root <iwzbi@zju.edu.cn>
Co-authored-by: root <iwzbi@zju.edu.cn>
2025-12-15 10:38:29 +00:00
wang.yuqi
4429d934de
[Model] Automatic conversion of TokenClassification model (#30666)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-15 08:13:00 +00:00
ゆり
33278073d6
typing: Add type hints to TurnMetrics class in context.py (#30552)
Co-authored-by: zkexorability <zkexorability@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 23:00:39 -08:00
汪志鹏
1adeb3b84c
[New Model] BAGEL support (AR only) (#28439)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-15 14:58:23 +08:00
Kunshang Ji
e3a1cd1c59
[XPU] fix Dockerfile.xpu, avoid wheel conflicts (#30662)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-12-15 13:32:06 +08:00
Wentao Ye
3778673ea8
[Feat] Refactor for parallel_config in FusedMoEModularKernel (#30282)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-15 04:21:36 +00:00
Seokhyun An
b337647aa0
[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template (#30648)
Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
2025-12-15 04:21:12 +00:00
Jee Jee Li
a524d1ba0a
[Bugfix] Fix deepseek_v32 tokenizer_mode (#30658)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-15 04:20:31 +00:00