Michael Goin
dae9ec464c
Temporarily disable test_awq_gemm_opcheck ( #14251 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-05 06:10:35 +00:00
youkaichao
6eaf93020d
[platforms] improve rocm debugging info ( #14257 )
2025-03-04 21:32:18 -08:00
Tyler Michael Smith
72c62eae5f
[V1] EP/TP MoE + DP Attention ( #13931 )
2025-03-04 21:27:26 -08:00
Congcong Chen
0a995d5434
[Model] New model support for Phi-4-multimodal-instruct ( #14119 )
2025-03-04 20:57:01 -08:00
Cody Yu
ade3f7d988
[V1][Bugfix] Do not reset prefix caching metrics ( #14235 )
2025-03-05 04:39:13 +00:00
rainkert
0df25101d6
[Bugfix] Fix gptq_marlin for deepseek-v3 ( #13750 )
...
Signed-off-by: dangshunya <dangshunya@baichuan-inc.com>
Co-authored-by: dangshunya <dangshunya@baichuan-inc.com>
2025-03-05 12:25:53 +08:00
Michael Goin
e123aafdf0
Disable GPTQ AllSpark kernels for CUDA Compiler < 12.0 ( #14157 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-05 12:25:24 +08:00
Nishidha
5b143d33be
Moved numba from common requirements to cuda/rocm specific requirements ( #14199 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2025-03-05 12:25:00 +08:00
youkaichao
eb59b5a6cb
[misc] announce china meetup ( #14248 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-05 10:33:50 +08:00
Michael Goin
fbfc3ee37e
[V1][TPU] TPU multimodal model support for ragged attention ( #14158 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-03-04 19:58:48 -05:00
Sage Moore
3e1d223626
[ROCm] Disable a few more kernel tests that are broken on ROCm ( #14145 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-03-04 23:37:55 +00:00
Tyler Michael Smith
4f5b059f14
Clean up unused padding_idx variables across many model definitions ( #13240 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-04 21:27:00 +00:00
Kuntai Du
288ca110f6
[Security] Serialize using safetensors instead of pickle in Mooncake Pipe ( #14228 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-03-04 21:10:32 +00:00
Mark McLoughlin
c2bd2196fc
[v1][Metrics] Add design doc ( #12745 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-04 20:36:55 +00:00
Michael Goin
550c7ba3dc
[Docs] Update Dockerfile dependency image ( #14215 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-04 20:22:11 +00:00
Harry Mellor
e5b2f1601a
[Frontend] Do prompt_logprobs clamping for chat as well as completions ( #14225 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-04 20:13:06 +00:00
Harry Mellor
9badee53de
Fix performance when --generation-config is not None ( #14223 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-04 20:59:22 +01:00
Siyuan Liu
beebf4742a
[TPU][Profiler] Support start_profile/stop_profile in TPU worker ( #13988 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-03-04 14:40:06 -05:00
kushanam
f89978ad7c
add cutlass support for blackwell fp8 gemm ( #13798 )
2025-03-04 07:55:07 -08:00
lkchen
b3cf368d79
[V1][Molmo] Fix get_multimodal_embeddings() in molmo.py ( #14161 )
2025-03-04 15:43:59 +00:00
Mark McLoughlin
c8525f06fc
[V0][Metrics] Deprecate some questionable request time metrics ( #14135 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-04 15:11:33 +00:00
Nick Hill
5db6b2c961
[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs ( #13869 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-04 15:06:47 +00:00
Michael Goin
6247bae6c6
[Bugfix] Restrict MacOS CPU detection ( #14210 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-04 22:25:27 +08:00
youkaichao
3610fb4930
[doc] add "Failed to infer device type" to faq ( #14200 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-04 20:47:06 +08:00
youkaichao
71c4b40562
[sleep mode] error out with expandable_segments ( #14189 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-04 18:54:19 +08:00
youkaichao
ac65bc92df
[platform] add debug logging during inferring the device type ( #14195 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-04 18:39:16 +08:00
Michael Goin
f78c0be80a
Fix benchmark_moe.py tuning for CUDA devices ( #14164 )
2025-03-03 21:11:03 -08:00
Zhanwen Chen
66233af7b6
Use math.prod instead of np.prod for trivial ops ( #14142 )
2025-03-03 21:09:22 -08:00
Rui Qiao
bf13d40972
[core] Pass all driver env vars to ray workers unless excluded ( #14099 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-03-04 11:44:17 +08:00
Cody Yu
989f4f430c
[Misc] Remove lru_cache in NvmlCudaPlatform ( #14156 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-04 11:09:34 +08:00
Divakar Verma
bb5b640359
[core] moe fp8 block quant tuning support ( #14068 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-03-04 01:30:23 +00:00
Travis Johnson
c060b71408
[Model] Add support for GraniteMoeShared models ( #13313 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-04 08:04:52 +08:00
iefgnoix
79e4937c65
[v1] Add comments to the new ragged paged attention Pallas kernel ( #14155 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-03-03 23:00:55 +00:00
Qubitium-ModelCloud
cd1d3c3df8
[Docs] Add GPTQModel ( #14056 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-03-03 21:59:09 +00:00
Michael Goin
19d98e0c7d
[Kernel] Optimize moe intermediate_cache usage ( #13625 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-03 16:29:53 -05:00
Michael Goin
2b04c209ee
[Bugfix] Allow shared_experts skip quantization for DeepSeekV2/V3 ( #14100 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-03 14:20:24 -07:00
Mark McLoughlin
ae122b1cbd
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics ( #14055 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-03 19:04:45 +00:00
Nick Hill
872db2be0e
[V1] Simplify stats logging ( #14082 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-03 10:34:14 -08:00
Mark McLoughlin
2dfdfed8a0
[V0][Metrics] Deprecate some KV/prefix cache metrics ( #14136 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-03 18:25:46 +00:00
Mark McLoughlin
c41d27156b
[V0][Metrics] Remove unimplemented vllm:tokens_total ( #14134 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-03 17:50:22 +00:00
Harry Mellor
91373a0d15
Fix head_dim not existing in all model configs (Transformers backend) ( #14141 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-03 17:48:11 +00:00
TJian
848a6438ae
[ROCm] Faster Custom Paged Attention kernels ( #12348 )
2025-03-03 09:24:45 -08:00
Harry Mellor
98175b2816
Improve the docs for TransformersModel ( #14147 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-03 17:03:05 +00:00
Mark McLoughlin
4167252eaf
[V1] Refactor parallel sampling support ( #13774 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-03 08:15:27 -08:00
Cody Yu
f35f8e2242
[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 ( #13921 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-03 16:43:14 +08:00
Mengqing Cao
b87c21fc89
[Misc][Platform] Move use allgather to platform ( #14010 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-03-03 15:40:04 +08:00
wang.yuqi
e584b85afd
[Misc] duplicate code in deepseek_v2 ( #14106 )
2025-03-03 14:10:11 +08:00
Sheng Yao
09e56f9262
[Bugfix] Explicitly include "omp.h" for MacOS to avoid installation failure ( #14051 )
2025-03-02 17:35:01 -08:00
Harry Mellor
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
Ce Gao
bf33700ecd
[v0][structured output] Support reasoning output ( #12955 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-02 14:49:42 -05:00