inkcherry
|
38d51f6dd8
|
refine code
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:59 +00:00 |
|
inkcherry
|
fd63437837
|
update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
0a3ae0b0cc
|
update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
9d29f361fb
|
update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
96da87bfe0
|
refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
857d93cbfb
|
fix all commit
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
795a305b1b
|
fix format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
e0885e52d9
|
break long line
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
f75eecde0a
|
fix all mypy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
3f7120368e
|
fix mypy and tp test pass
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
4c79f34e8a
|
fix mypy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
9b90f5ddb2
|
update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
a0d74ebf7f
|
fix format error
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
08cd2efbb6
|
refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
bba4c89ca4
|
format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
4034937733
|
remove port
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
b60ee86585
|
format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
4f592ae696
|
format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
245b71a891
|
refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
64694c3e76
|
refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
70ea1b2460
|
refine code
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
68a2333339
|
fix dp proxy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
f8e9adfea8
|
refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
ecbad2a70b
|
add proxy example
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
e0f4336a5b
|
format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
675943e018
|
fix dp router
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
a7ea23d16d
|
fix with new main branch
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
b3e31b42d8
|
update gitignore
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
9a15ae9f72
|
initial commit
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
Matthew Bonanni
|
4c23690f43
|
[Attention] FlashAttention ViT support, make default backend (#28763)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-18 20:06:21 -08:00 |
|
Strahinja Stamenkovic
|
814843e021
|
Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307)
Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>
|
2025-11-19 03:12:31 +00:00 |
|
Li, Jiang
|
20852c8f4c
|
[CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-19 10:32:00 +08:00 |
|
Jialin Ouyang
|
40b6b38f2c
|
[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-11-19 02:10:02 +00:00 |
|
Jerry Zhang
|
da94c7c0eb
|
Move online quantization to model.load_weights (#26327)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-11-18 16:52:41 -08:00 |
|
tomeras91
|
1395461f5f
|
[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-11-18 16:49:36 -08:00 |
|
Varun Sundar Rabindranath
|
9912b8ccb8
|
[Build] Add OpenAI triton_kernels (#28788)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-18 16:45:20 -08:00 |
|
Johnny
|
49ef847aa8
|
[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (#28938)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
|
2025-11-18 16:44:27 -08:00 |
|
Michael Goin
|
67745d189f
|
Supress verbose logs from model_hosting_container_standards (#28949)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-18 12:29:06 -08:00 |
|
Kunshang Ji
|
2a2d5d2780
|
Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-18 11:34:36 -08:00 |
|
Chendi.Xue
|
c3e2978620
|
[NIXL] fix cpu PD after physical <> logical block_size PR (#28904)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-11-18 14:03:23 -05:00 |
|
Isotr0py
|
e4bb2684bc
|
[Models] Replace all nn.Conv2d with vLLM's Conv2dLayer (#28842)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 18:56:04 +00:00 |
|
Kevin H. Luu
|
c64c0b78de
|
[chore] Move the rest of wikimedia url to S3 (#28921)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 09:44:18 -08:00 |
|
vllmellm
|
0af3d4f0df
|
[FEAT] [AITER] [ROCm] integrate aiter sampling ops (#26084)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-18 17:28:34 +00:00 |
|
Nick Hill
|
da8dadf68b
|
[Minor] Rename ec_producer field to is_ec_producer (#28884)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-18 17:26:07 +00:00 |
|
Nicolò Lucchesi
|
f226a3f0c1
|
[CI][NIXL] Change default block_size for tests (#28927)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-18 09:22:30 -08:00 |
|
Luciano Martins
|
c2612371ad
|
[Model] Add Gemma3 GGUF multimodal support (#27772)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 08:56:29 -08:00 |
|
Ido Segev
|
49a986ecd4
|
[Benchmark] multi_turn: Report warmup-inclusive runtime (#28937)
Signed-off-by: Ido Segev <idos@pliops.com>
|
2025-11-18 16:38:22 +00:00 |
|
Alex
|
f6aa122698
|
[CI Sprint] Quantization CI Cleanup (#24130)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
|
2025-11-18 09:21:48 -05:00 |
|
Nicolò Lucchesi
|
184b12fdc6
|
[Bugfix][NIXL] Fix block_size_ratio when logical !=physical blocks (#28925)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-18 22:07:50 +08:00 |
|
Canlin Guo
|
b9489f51e1
|
[Model][Perf] Use cos and sin cache in QwenVL (#28798)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-18 11:51:54 +00:00 |
|