Louie Tsai
|
e41c10d5cf
|
Update dashboard.md and Update README.md to remove duplicated section
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 11:12:58 -08:00 |
|
Tsai, Louie
|
ff80f1427a
|
remove enforce-eager according to feedback.
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:44:05 -08:00 |
|
Louie Tsai
|
b00fd3592e
|
Update dashboard.md for perf_comparison.html report update
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
898e868d28
|
fix a mulitple TP/PP size comparison issue in a table
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
76862427f1
|
pre-commit fix
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
f825a14d56
|
add sizing table
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
db9aaa61ac
|
minor function name change
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
0e01150cb4
|
group-first report instead of data-column-first
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
63ebc2336d
|
code refactor to improve readabliity
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
efa495545c
|
highlight ratio in througput
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
763d48dbcb
|
highlight ratio for TTFT and TPOT
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
ba0bf189c8
|
improve table readability
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Tsai, Louie
|
b735255f17
|
improve cpu tests for 0.12.0
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-24 10:32:02 -08:00 |
|
Cyrus Leung
|
09dc7c690c
|
[Chore][1/2] Drop v0.14 deprecations (#31285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 09:54:01 -08:00 |
|
ゆり
|
506eb0f454
|
[Bugfix] Remove dead block_quant_to_tensor_quant function (#31294)
Co-authored-by: yurekami <yurekami@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2025-12-24 17:22:48 +00:00 |
|
Ning Xie
|
5d93089686
|
[cli] complete vllm cli help message (#31226)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-12-24 15:45:47 +00:00 |
|
Kevin McKay
|
66c9887440
|
[Bugfix][Hardware][AMD] Fix FP8 dtype in silu_mul quantization (#31179)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-24 10:37:11 -05:00 |
|
wang.yuqi
|
1ff67df182
|
[CI] Reorganization pooling_mteb_test (#31265)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-12-24 23:36:20 +08:00 |
|
skaraban3807
|
7cd288a4b3
|
[PERF] Add interleaved memory allocation to NUMA module (#30800)
|
2025-12-24 13:47:49 +00:00 |
|
Cyrus Leung
|
d201807339
|
[Chore] Bump lm-eval version (#31264)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 05:39:13 -08:00 |
|
Cyrus Leung
|
aa3868ecfe
|
[Chore] Remove unused noqas (#31263)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 05:38:46 -08:00 |
|
Cyrus Leung
|
7adeb4bfa8
|
[Bugfix] Fix max_model_len="auto" handling (#31260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 19:15:27 +08:00 |
|
wang.yuqi
|
bd89ce16d2
|
[Model] Introduce verify_and_update_model_config for VerifyAndUpdateConfig. (#31131)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-12-24 09:54:57 +00:00 |
|
Pleaplusone
|
b41aeb3468
|
[Bugfix][ROCm] Fix load issue on deepseek quark quantization when shared expert enabled (#31261)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-12-24 16:47:44 +08:00 |
|
Ryan Rock
|
ddfac7034e
|
[CI/Build] Ignore data_parallel_size_local (#30281)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2025-12-24 07:40:54 +00:00 |
|
Micah Williamson
|
6559d96796
|
[ROCm][CI] Set TORCH_NCCL_BLOCKING_WAIT Distributed Tests On ROCm (#31259)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-24 07:19:07 +00:00 |
|
kliuae
|
1c74150bca
|
[ROCm][CI] Fix "Distributed Tests (H200)" Test (#31227)
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
|
2025-12-24 06:56:30 +00:00 |
|
Andreas Karatzas
|
0247a91e00
|
[ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm (#28979)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-23 22:42:30 -08:00 |
|
Michael Goin
|
8ee90c83f8
|
Add --max-model-len auto to auto-fit context to available memory (#29431)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-23 21:37:14 -08:00 |
|
Nick Cao
|
d7e05ac743
|
[docker] Fix downloading sccache on aarch64 platform (#30070)
Signed-off-by: Nick Cao <nickcao@nichi.co>
|
2025-12-23 21:36:33 -08:00 |
|
sihao_li
|
471ddb99a0
|
[XPU] Remove distributed_executor_backend check (#30760)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-12-23 21:34:33 -08:00 |
|
Xiong Wang
|
bb24592d13
|
[Qwen3-Omni] fixed _get_feat_extract_output_lengths function (#31007)
Signed-off-by: Xiong Wang <wangxiongts@163.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-12-23 21:33:54 -08:00 |
|
Matthew Bonanni
|
369f47aa0f
|
[DeepSeek v3.2] Remove unnecessary syncwarps (#31047)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-23 21:33:30 -08:00 |
|
zejunchen-zejun
|
dabff12ed3
|
[Bugfix][ROCm][Dynamo][DS 3.1][FP8] fix unsupported hasattr call when Dynamo tracing for ROCm device (#31149)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-12-23 21:32:19 -08:00 |
|
Ming Yang
|
3bb9561928
|
Revert "[bench] Support common prefix len config (for decode-only bench)" (#31240)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-12-23 21:17:23 -08:00 |
|
Micah Williamson
|
3ce791ac77
|
[ROCm][CI] Set VLLM_FLOAT32_MATMUL_PRECISION="tf32" For terratorch Tests In AMD CI (#31242)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-24 03:21:50 +00:00 |
|
Andreas Karatzas
|
e42894f5b5
|
[ROCm][CI][Bugfix] Fix Siglip2 rotary embedding dispatch and InternVL video test tolerance (#31235)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-24 02:56:58 +00:00 |
|
Wentao Ye
|
76e6a95192
|
[Bug] Fix Number of dimensions of tensors must match. for Deepseek V3.2 (#31160)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-24 10:41:09 +08:00 |
|
Chao Lei
|
8b59753cdb
|
[P/D] Mooncake connector support more protocols (#30133)
Signed-off-by: LCAIZJ <leichao139636@163.com>
|
2025-12-24 10:24:07 +08:00 |
|
Chen Zhang
|
538e830caa
|
[KVEvent] User request.block_hash for parent block_hash (#30544)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>
|
2025-12-23 18:23:43 -08:00 |
|
rongfu.leng
|
4ed11105d7
|
[Misc] Remove unused custom ops copy_blocks and copy_blocks_mla (#30967)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-12-23 18:22:35 -08:00 |
|
Cyrus Leung
|
dd424571c8
|
[Bugfix] Enable dynamic_dims for different embeds shape (#31223)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 10:15:47 +08:00 |
|
Cyrus Leung
|
ca6a95ba25
|
[Chore] Simplify logic of _execute_mm_encoder (#31222)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-23 18:15:16 -08:00 |
|
Vadim Gimpelson
|
bc0a5a0c08
|
[CI] Add Qwen3-Next-FP8 to Blackwell model tests (#31049)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-12-23 17:21:50 -08:00 |
|
Andreas Karatzas
|
bfa2c0bbb9
|
[ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() (#31203)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-23 21:48:01 +00:00 |
|
Mark McLoughlin
|
f790068600
|
[Core] Add a random suffix to frontend-provided request IDs (#27987)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-12-23 13:05:39 -08:00 |
|
Asaf Joseph Gardin
|
34916ae37f
|
[Mamba] - Consolidate Mambas Attention Logic (#28133)
|
2025-12-23 21:57:00 +01:00 |
|
Yuan Tang
|
0736f901e7
|
docs: Add llm-d integration to the website (#31234)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-12-23 20:27:22 +00:00 |
|
Harry Mellor
|
c016c95b45
|
Use helper function instead of looping through attribute names (#29788)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-23 17:31:56 +00:00 |
|
Harry Mellor
|
1339878e13
|
Only patch original_max_position_embeddings for Transformers v4 (#31214)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-23 16:46:32 +00:00 |
|