inkcherry
4776e2ddcf
more
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
72ccb5d77c
remove handle_proxy_request
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
38d51f6dd8
refine code
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
fd63437837
update
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
0a3ae0b0cc
update
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
9d29f361fb
update
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
96da87bfe0
refine
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
857d93cbfb
fix all commit
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
795a305b1b
fix format
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
e0885e52d9
break long line
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
f75eecde0a
fix all mypy
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
3f7120368e
fix mypy and tp test pass
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4c79f34e8a
fix mypy
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
9b90f5ddb2
update
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
a0d74ebf7f
fix format error
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
08cd2efbb6
refine
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
bba4c89ca4
format
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4034937733
remove port
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
b60ee86585
format
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4f592ae696
format
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
245b71a891
refine
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
64694c3e76
refine
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
70ea1b2460
refine code
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
68a2333339
fix dp proxy
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
f8e9adfea8
refine
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
ecbad2a70b
add proxy example
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
e0f4336a5b
format
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
675943e018
fix dp router
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
a7ea23d16d
fix with new main branch
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
b3e31b42d8
update gitignore
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
9a15ae9f72
initial commit
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
Johnny Yang
3ecabd06ee
Fix tpu-inference platform path ( #29554 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 23:25:21 -08:00
Jee Jee Li
c069086b9c
[Bugfix] Fix getting device for MoE LoRA ( #29475 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-26 23:16:07 -08:00
Woosuk Kwon
11ea5ec1ff
[Model Runner V2] Refactor CudaGraphManager ( #29583 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 21:37:59 -08:00
Fadi Arafeh
ecb1952378
[cpu][fix] Fix Arm CI tests ( #29552 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-27 13:09:41 +08:00
TJian
da8e1a1bf9
[DOC] Add vLLM Bangkok Meetup info ( #29561 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-27 04:42:50 +00:00
Woosuk Kwon
ee80aee1ca
[Model Runner V2] Minor cleanup for build_attn_metadata ( #29576 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 20:10:12 -08:00
Woosuk Kwon
0aeb698b77
[Model Runner V2] Minor code cleanup ( #29570 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 19:47:17 -08:00
Louie Tsai
9bb33c8919
add xpu supported model and model id for cpu ( #29380 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-11-27 11:30:50 +08:00
Jinzhen Lin
a67dec7cba
[Bugfix] fix IMA issue in certain cases of the moe marlin kernel ( #28619 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-26 19:02:21 -08:00
Matthew Bonanni
77740191de
[Attention][Async] Eliminate seq_lens_cpu in FlashAttention metadata building with DCP > 1 ( #29449 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 18:48:43 -08:00
HDCharles
df01eda4dc
[Bugfix] Make compressed-tensors MoEs respect ignored layers ( #28878 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
2025-11-26 21:35:13 -05:00
Johnny Yang
ba1fcd84a7
[TPU] add tpu_inference ( #27277 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 14:46:36 -08:00
Lucas Wilkinson
56539cddac
[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building ( #28579 )
2025-11-26 14:07:13 -05:00
Matthew Bonanni
430dd4d9eb
[Attention] Remove imports from vllm/attention/__init__.py ( #29342 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 10:53:15 -07:00
Alec
c4c0354eec
[CI/Build] allow user modify pplx and deepep ref by ENV or command line ( #29131 )
...
Signed-off-by: alec-flowers <aflowers@nvidia.com>
2025-11-26 17:41:16 +00:00
HDCharles
e603129505
[refactor] CTConfig methods to static/class methods ( #28870 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-26 17:21:58 +00:00
Wentao Ye
0b0aa874e8
[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement ( #29345 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-26 09:38:52 -07:00
Huamin Li
70d5953f82
Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )" ( #29483 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-26 22:27:26 +08:00
yxt
3650a74ed8
Optimize the wording of the document and unify the terminology and th… ( #29491 )
2025-11-26 05:16:12 -08:00