11447 Commits

Author SHA1 Message Date
inkcherry
7d3a93f1e7 format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
63e6cff196 update proxy path
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
77321502e7 update lock
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
374cc25e0f format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
16d2a7a343 updata finished request collection
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
1c10f47dc6 tp write single pass
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
b29f405aa5 update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
4776e2ddcf more
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
72ccb5d77c remove handle_proxy_request
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
38d51f6dd8 refine code
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:59 +00:00
inkcherry
fd63437837 update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
0a3ae0b0cc update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
9d29f361fb update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
96da87bfe0 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
857d93cbfb fix all commit
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
795a305b1b fix format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
e0885e52d9 break long line
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
f75eecde0a fix all mypy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
3f7120368e fix mypy and tp test pass
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4c79f34e8a fix mypy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
9b90f5ddb2 update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
a0d74ebf7f fix format error
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
08cd2efbb6 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
bba4c89ca4 format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4034937733 remove port
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
b60ee86585 format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
4f592ae696 format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
245b71a891 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
64694c3e76 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
70ea1b2460 refine code
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
68a2333339 fix dp proxy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:58 +00:00
inkcherry
f8e9adfea8 refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
ecbad2a70b add proxy example
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
e0f4336a5b format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
675943e018 fix dp router
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
a7ea23d16d fix with new main branch
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
b3e31b42d8 update gitignore
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
inkcherry
9a15ae9f72 initial commit
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-11-27 07:30:57 +00:00
Matthew Bonanni
4c23690f43
[Attention] FlashAttention ViT support, make default backend (#28763)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-18 20:06:21 -08:00
Strahinja Stamenkovic
814843e021
Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307)
Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>
2025-11-19 03:12:31 +00:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Jialin Ouyang
40b6b38f2c
[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-11-19 02:10:02 +00:00
Jerry Zhang
da94c7c0eb
Move online quantization to model.load_weights (#26327)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-11-18 16:52:41 -08:00
tomeras91
1395461f5f
[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-11-18 16:49:36 -08:00
Varun Sundar Rabindranath
9912b8ccb8
[Build] Add OpenAI triton_kernels (#28788)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-18 16:45:20 -08:00
Johnny
49ef847aa8
[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (#28938)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
2025-11-18 16:44:27 -08:00
Michael Goin
67745d189f
Supress verbose logs from model_hosting_container_standards (#28949)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-18 12:29:06 -08:00
Kunshang Ji
2a2d5d2780
Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-18 11:34:36 -08:00
Chendi.Xue
c3e2978620
[NIXL] fix cpu PD after physical <> logical block_size PR (#28904)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-11-18 14:03:23 -05:00
Isotr0py
e4bb2684bc
[Models] Replace all nn.Conv2d with vLLM's Conv2dLayer (#28842)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 18:56:04 +00:00