inkcherry
|
4c79f34e8a
|
fix mypy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
9b90f5ddb2
|
update
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
a0d74ebf7f
|
fix format error
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
08cd2efbb6
|
refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
bba4c89ca4
|
format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
4034937733
|
remove port
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
b60ee86585
|
format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
4f592ae696
|
format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
245b71a891
|
refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
64694c3e76
|
refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
70ea1b2460
|
refine code
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
68a2333339
|
fix dp proxy
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:58 +00:00 |
|
inkcherry
|
f8e9adfea8
|
refine
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
ecbad2a70b
|
add proxy example
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
e0f4336a5b
|
format
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
675943e018
|
fix dp router
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
a7ea23d16d
|
fix with new main branch
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
b3e31b42d8
|
update gitignore
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
inkcherry
|
9a15ae9f72
|
initial commit
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-11-27 07:30:57 +00:00 |
|
Matthew Bonanni
|
4c23690f43
|
[Attention] FlashAttention ViT support, make default backend (#28763)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-18 20:06:21 -08:00 |
|
Strahinja Stamenkovic
|
814843e021
|
Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307)
Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>
|
2025-11-19 03:12:31 +00:00 |
|
Li, Jiang
|
20852c8f4c
|
[CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-19 10:32:00 +08:00 |
|
Jialin Ouyang
|
40b6b38f2c
|
[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-11-19 02:10:02 +00:00 |
|
Jerry Zhang
|
da94c7c0eb
|
Move online quantization to model.load_weights (#26327)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-11-18 16:52:41 -08:00 |
|
tomeras91
|
1395461f5f
|
[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-11-18 16:49:36 -08:00 |
|
Varun Sundar Rabindranath
|
9912b8ccb8
|
[Build] Add OpenAI triton_kernels (#28788)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-18 16:45:20 -08:00 |
|
Johnny
|
49ef847aa8
|
[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (#28938)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
|
2025-11-18 16:44:27 -08:00 |
|
Michael Goin
|
67745d189f
|
Supress verbose logs from model_hosting_container_standards (#28949)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-18 12:29:06 -08:00 |
|
Kunshang Ji
|
2a2d5d2780
|
Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-18 11:34:36 -08:00 |
|
Chendi.Xue
|
c3e2978620
|
[NIXL] fix cpu PD after physical <> logical block_size PR (#28904)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-11-18 14:03:23 -05:00 |
|
Isotr0py
|
e4bb2684bc
|
[Models] Replace all nn.Conv2d with vLLM's Conv2dLayer (#28842)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 18:56:04 +00:00 |
|
Kevin H. Luu
|
c64c0b78de
|
[chore] Move the rest of wikimedia url to S3 (#28921)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 09:44:18 -08:00 |
|
vllmellm
|
0af3d4f0df
|
[FEAT] [AITER] [ROCm] integrate aiter sampling ops (#26084)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-18 17:28:34 +00:00 |
|
Nick Hill
|
da8dadf68b
|
[Minor] Rename ec_producer field to is_ec_producer (#28884)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-18 17:26:07 +00:00 |
|
Nicolò Lucchesi
|
f226a3f0c1
|
[CI][NIXL] Change default block_size for tests (#28927)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-18 09:22:30 -08:00 |
|
Luciano Martins
|
c2612371ad
|
[Model] Add Gemma3 GGUF multimodal support (#27772)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 08:56:29 -08:00 |
|
Ido Segev
|
49a986ecd4
|
[Benchmark] multi_turn: Report warmup-inclusive runtime (#28937)
Signed-off-by: Ido Segev <idos@pliops.com>
|
2025-11-18 16:38:22 +00:00 |
|
Alex
|
f6aa122698
|
[CI Sprint] Quantization CI Cleanup (#24130)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
|
2025-11-18 09:21:48 -05:00 |
|
Nicolò Lucchesi
|
184b12fdc6
|
[Bugfix][NIXL] Fix block_size_ratio when logical !=physical blocks (#28925)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-18 22:07:50 +08:00 |
|
Canlin Guo
|
b9489f51e1
|
[Model][Perf] Use cos and sin cache in QwenVL (#28798)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-18 11:51:54 +00:00 |
|
Song Zhixin
|
285eaa4285
|
[Bugfix] Safeguard against missing backend in AttentionBackendEnum (#28846)
Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: Song Zhixin <szxfml@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 10:53:44 +00:00 |
|
Nick Hill
|
439368496d
|
[BugFix] Fix PP/async scheduling with pooling models (#28899)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-18 00:20:45 -08:00 |
|
Isotr0py
|
896e41ae04
|
[CI/Build] Replace wikipedia url with local server ones (#28908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 08:10:55 +00:00 |
|
Kuntai Du
|
5bb1da5190
|
[MISC] Remove format.sh (#28906)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-11-18 05:28:31 +00:00 |
|
Nick Hill
|
5bdd155277
|
[CI] Fix async scheduling + spec decoding test flake (#28902)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-18 05:26:32 +00:00 |
|
Ning Xie
|
0168f69e50
|
[Misc] Remove unnecessary parentheses from log statements (#28897)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-17 20:33:46 -08:00 |
|
Didier Durand
|
083cf326dc
|
[Doc]: fix typos in various files (#28863)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-17 20:32:14 -08:00 |
|
Cyrus Leung
|
bf9e1e8767
|
[Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields (#28872)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-17 20:30:29 -08:00 |
|
Wentao Ye
|
3ddcf46011
|
[Refactor] Remove Unused Func in Batch Invariant (#28881)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-17 20:29:29 -08:00 |
|
xuebwang-amd
|
d0a73620cc
|
[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss (#28638)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 11:16:45 +08:00 |
|