inkcherry
f75eecde0a
fix all mypy
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
3f7120368e
fix mypy and tp test pass
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4c79f34e8a
fix mypy
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
9b90f5ddb2
update
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
a0d74ebf7f
fix format error
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
08cd2efbb6
refine
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
bba4c89ca4
format
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4034937733
remove port
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
b60ee86585
format
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4f592ae696
format
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
245b71a891
refine
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
64694c3e76
refine
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
70ea1b2460
refine code
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
68a2333339
fix dp proxy
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
f8e9adfea8
refine
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
ecbad2a70b
add proxy example
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
e0f4336a5b
format
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
675943e018
fix dp router
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
a7ea23d16d
fix with new main branch
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
b3e31b42d8
update gitignore
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
9a15ae9f72
initial commit
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
Johnny Yang
3ecabd06ee
Fix tpu-inference platform path ( #29554 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 23:25:21 -08:00
Jee Jee Li
c069086b9c
[Bugfix] Fix getting device for MoE LoRA ( #29475 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-26 23:16:07 -08:00
Woosuk Kwon
11ea5ec1ff
[Model Runner V2] Refactor CudaGraphManager ( #29583 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 21:37:59 -08:00
Fadi Arafeh
ecb1952378
[cpu][fix] Fix Arm CI tests ( #29552 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-27 13:09:41 +08:00
TJian
da8e1a1bf9
[DOC] Add vLLM Bangkok Meetup info ( #29561 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-27 04:42:50 +00:00
Woosuk Kwon
ee80aee1ca
[Model Runner V2] Minor cleanup for build_attn_metadata ( #29576 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 20:10:12 -08:00
Woosuk Kwon
0aeb698b77
[Model Runner V2] Minor code cleanup ( #29570 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 19:47:17 -08:00
Louie Tsai
9bb33c8919
add xpu supported model and model id for cpu ( #29380 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-11-27 11:30:50 +08:00
Jinzhen Lin
a67dec7cba
[Bugfix] fix IMA issue in certain cases of the moe marlin kernel ( #28619 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-26 19:02:21 -08:00
Matthew Bonanni
77740191de
[Attention][Async] Eliminate seq_lens_cpu in FlashAttention metadata building with DCP > 1 ( #29449 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 18:48:43 -08:00
HDCharles
df01eda4dc
[Bugfix] Make compressed-tensors MoEs respect ignored layers ( #28878 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
2025-11-26 21:35:13 -05:00
Johnny Yang
ba1fcd84a7
[TPU] add tpu_inference ( #27277 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 14:46:36 -08:00
Lucas Wilkinson
56539cddac
[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building ( #28579 )
2025-11-26 14:07:13 -05:00
Matthew Bonanni
430dd4d9eb
[Attention] Remove imports from vllm/attention/__init__.py ( #29342 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 10:53:15 -07:00
Alec
c4c0354eec
[CI/Build] allow user modify pplx and deepep ref by ENV or command line ( #29131 )
...
Signed-off-by: alec-flowers <aflowers@nvidia.com>
2025-11-26 17:41:16 +00:00
HDCharles
e603129505
[refactor] CTConfig methods to static/class methods ( #28870 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-26 17:21:58 +00:00
Wentao Ye
0b0aa874e8
[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement ( #29345 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-26 09:38:52 -07:00
Huamin Li
70d5953f82
Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )" ( #29483 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-26 22:27:26 +08:00
yxt
3650a74ed8
Optimize the wording of the document and unify the terminology and th… ( #29491 )
2025-11-26 05:16:12 -08:00
Yejing Lai
bb706d6048
Fix TeleChatForCausalLM not register issue ( #29473 )
...
Signed-off-by: Lai, Yejing <yejing.lai@intel.com>
2025-11-26 05:15:00 -08:00
Cyrus Leung
e30859dff3
[Bugfix] Fix handling of image embeds in models ( #29480 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-26 05:00:15 -08:00
Roger Wang
452a7c9f7c
[Misc] Allow LM only loading for Pixtral ( #29451 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-26 05:00:00 -08:00
Pleaplusone
d9d342d214
[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek ( #27457 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-26 12:45:28 +08:00
Xin Yang
53d7f1f601
[Kernel] Use pre-allocated output buffer for triton kernel fused_experts ( #29219 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com>
2025-11-26 10:21:00 +08:00
dependabot[bot]
c5ee430328
Bump actions/checkout from 4 to 6 ( #29293 )
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-26 01:57:08 +00:00
Michael Goin
8d6a89dffd
[UX] Suppress gloo log spam ( #29250 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-25 17:19:35 -08:00
George D. Torres
56531b79cc
[Misc] Add backup hash algorithm for FIPS constrained environments ( #28795 )
...
Signed-off-by: George D. Torres <gdavtor@gmail.com>
Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-11-26 00:50:22 +00:00
Xieyang Xu
12866af748
dummy run corner case ( #29433 )
2025-11-26 00:20:35 +00:00
Lucia Fang
d8819c88eb
fix assertion for single world use case (uni) ( #29429 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-11-26 00:14:23 +00:00