12536 Commits

Author SHA1 Message Date
inkcherry
4776e2ddcf more
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
72ccb5d77c remove handle_proxy_request
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
38d51f6dd8 refine code
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
fd63437837 update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
0a3ae0b0cc update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
9d29f361fb update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
96da87bfe0 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
857d93cbfb fix all commit
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
795a305b1b fix format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
e0885e52d9 break long line
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
f75eecde0a fix all mypy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
3f7120368e fix mypy and tp test pass
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4c79f34e8a fix mypy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
9b90f5ddb2 update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
a0d74ebf7f fix format error
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
08cd2efbb6 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
bba4c89ca4 format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4034937733 remove port
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
b60ee86585 format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4f592ae696 format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
245b71a891 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
64694c3e76 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
70ea1b2460 refine code
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
68a2333339 fix dp proxy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
f8e9adfea8 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
ecbad2a70b add proxy example
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
e0f4336a5b format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
675943e018 fix dp router
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
a7ea23d16d fix with new main branch
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
b3e31b42d8 update gitignore
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
9a15ae9f72 initial commit
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
Johnny Yang
3ecabd06ee
Fix tpu-inference platform path (#29554)
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 23:25:21 -08:00
Jee Jee Li
c069086b9c
[Bugfix] Fix getting device for MoE LoRA (#29475)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-26 23:16:07 -08:00
Woosuk Kwon
11ea5ec1ff
[Model Runner V2] Refactor CudaGraphManager (#29583)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 21:37:59 -08:00
Fadi Arafeh
ecb1952378
[cpu][fix] Fix Arm CI tests (#29552)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-27 13:09:41 +08:00
TJian
da8e1a1bf9
[DOC] Add vLLM Bangkok Meetup info (#29561)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-27 04:42:50 +00:00
Woosuk Kwon
ee80aee1ca
[Model Runner V2] Minor cleanup for build_attn_metadata (#29576)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 20:10:12 -08:00
Woosuk Kwon
0aeb698b77
[Model Runner V2] Minor code cleanup (#29570)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 19:47:17 -08:00
Louie Tsai
9bb33c8919
add xpu supported model and model id for cpu (#29380)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-11-27 11:30:50 +08:00
Jinzhen Lin
a67dec7cba
[Bugfix] fix IMA issue in certain cases of the moe marlin kernel (#28619)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-26 19:02:21 -08:00
Matthew Bonanni
77740191de
[Attention][Async] Eliminate seq_lens_cpu in FlashAttention metadata building with DCP > 1 (#29449)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 18:48:43 -08:00
HDCharles
df01eda4dc
[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878)
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
2025-11-26 21:35:13 -05:00
Johnny Yang
ba1fcd84a7
[TPU] add tpu_inference (#27277)
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 14:46:36 -08:00
Lucas Wilkinson
56539cddac
[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building (#28579) 2025-11-26 14:07:13 -05:00
Matthew Bonanni
430dd4d9eb
[Attention] Remove imports from vllm/attention/__init__.py (#29342)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 10:53:15 -07:00
Alec
c4c0354eec
[CI/Build] allow user modify pplx and deepep ref by ENV or command line (#29131)
Signed-off-by: alec-flowers <aflowers@nvidia.com>
2025-11-26 17:41:16 +00:00
HDCharles
e603129505
[refactor] CTConfig methods to static/class methods (#28870)
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-26 17:21:58 +00:00
Wentao Ye
0b0aa874e8
[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement (#29345)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-26 09:38:52 -07:00
Huamin Li
70d5953f82
Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841)" (#29483)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-26 22:27:26 +08:00
yxt
3650a74ed8
Optimize the wording of the document and unify the terminology and th… (#29491) 2025-11-26 05:16:12 -08:00