lkchen
6685890d11
[Fix] Move "model_config" as keyword args in chat_utils.py ( #18098 )
...
Signed-off-by: Linkun <github@lkchen.net>
2025-05-13 23:27:26 -07:00
Ecthlion_zyy
33011318c2
Fix broken example: examples/offline_inference/profiling at scheduler_config ( #18117 )
2025-05-13 23:19:14 -07:00
qli88
4f8b373225
[BugFix][AMD] Compatible patch for AITER lib after 04/20 ( #17912 )
...
Signed-off-by: Qiang Li <qiang.li2@amd.com>
2025-05-13 23:05:20 -07:00
Charlie Fu
7b2f28deba
[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm ( #18082 )
...
Signed-off-by: charlifu <charlifu@amd.com>
2025-05-13 22:13:56 -07:00
vllmellm
2d912fb66f
[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 ( #17955 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-05-13 22:03:47 -07:00
Michael Goin
12e6c0b41c
[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig ( #18086 )
2025-05-13 20:36:17 -07:00
Michael Goin
9a2a6357de
[Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models ( #18026 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-13 19:48:33 -07:00
youkaichao
6266c57bae
[core][distributed] add ep group and all2all interface ( #18077 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-05-14 10:46:49 +08:00
Jon Gill
754b699cbe
[Bug]: Fix S3 model/tokenizer path resolution ( #18083 )
...
Signed-off-by: Jon Gill <jon@yurts.ai>
2025-05-13 19:34:17 -07:00
Roger Wang
6e27c6d86b
[Misc] Remove unused numpy tensor ( #18084 )
...
Signed-off-by: Roger Wang <hey@rogerw.me>
2025-05-13 19:33:40 -07:00
Nick Hill
d5af47a149
[P/D] Add some more debug logs to NixlConnector ( #18102 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-05-13 19:33:03 -07:00
Pavani Majety
65f0f74b66
[Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile ( #18101 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-05-13 19:33:00 -07:00
Luka Govedič
176a95c670
[Fix] Support CUDAGraph capture for encoder-decoder on ROCm ( #18104 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
2025-05-13 19:31:42 -07:00
Chen Zhang
f2ae883b67
[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager ( #18001 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-13 19:09:39 -07:00
vllmellm
40de1ef455
[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature ( #14968 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-05-13 19:08:20 -07:00
Russell Bryant
0189a65a2e
[Docs] Expand security doc with firewall info ( #18081 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-05-13 19:36:00 +00:00
Nick Hill
55aa7af994
[V1] DP scale-out (2/N): Decouple engine process management and comms ( #15977 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-05-13 10:48:21 -07:00
Harry Mellor
0b217da646
Update deprecated type hinting in vllm/adapter_commons ( #18073 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 08:32:51 -07:00
Harry Mellor
19324d660c
Update deprecated type hinting in vllm/compilation ( #18072 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 08:32:48 -07:00
Harry Mellor
fc407a1425
Give auto-merge label workflow permission to add labels to issues ( #18078 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 07:53:13 -07:00
Harry Mellor
009d9e7590
Convert benchmarks to ruff format ( #18068 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 13:43:29 +00:00
Cyrus Leung
b922c2ebd2
[Bugfix] Fix entrypoints metrics tests ( #18063 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-13 06:42:43 -07:00
Russell Bryant
00b14e0f16
[CI] set token permissions for pre-commit CI job ( #17729 )
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 13:38:30 +00:00
Russell Bryant
54e467e6f8
[CI] Add token permissions for add-ready-label CI job ( #17730 )
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 13:38:13 +00:00
Russell Bryant
79a1d25bbd
[CI] Add workflow permissions for helm CI job ( #17727 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 12:49:07 +00:00
Russell Bryant
9944011b30
[CI] Set token permissions for reminder comment CI job ( #17728 )
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 12:46:58 +00:00
Harry Mellor
8c946cecca
Update deprecated type hinting in vllm/transformers_utils ( #18058 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 04:34:37 -07:00
Harry Mellor
ff334ca1cd
Update deprecated type hinting in vllm/profiler ( #18057 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 04:34:34 -07:00
Harry Mellor
6223dd8114
Update deprecated type hinting in model_executor/layers ( #18056 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 04:17:23 -07:00
Reid
906f0598fc
[doc] add download/list/delete HF model CLI usage ( #17940 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-13 11:15:51 +00:00
Aaron Pham
cb528d0585
[Fix] check to make sure processor has chat templates ( #18047 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-05-13 03:04:10 -07:00
Harry Mellor
98fcba1575
Convert .buildkite to ruff format ( #17656 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 09:28:31 +00:00
Russell Bryant
23b3134eb5
[Benchmarks] Refactor run_structured_output_benchmarks.sh ( #17722 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-05-13 01:47:29 -07:00
Michael Goin
ea6ae8cb45
[Bugfix] Fix marlin moe fallback logic for llama4 ( #18042 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-13 07:53:28 +00:00
Woosuk Kwon
2ff297dce9
[BugFix] Set default random seed to 0 for V1 ( #17929 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-05-13 07:52:19 +00:00
Jin Huang
8dd0671bac
[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP ( #17916 )
...
Signed-off-by: Jin Huang <jinhun@amazon.com>
Co-authored-by: Jin Huang <jinhun@amazon.com>
2025-05-13 15:10:07 +08:00
Chen Zhang
f0d610a8ae
[v1][KVCacheManager] Avoid full cache hit by controlling max_length ( #17999 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-05-13 06:50:38 +00:00
Driss Guessous
e57e4d6e9e
Fix Broken macro for cutlass moe ( #18049 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com>
2025-05-12 23:31:06 -07:00
Nick Hill
ee5be834e7
[BugFix] Fix 4-GPU RLHF tests ( #18007 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-05-12 23:03:55 -07:00
Calvin Chen
48545728d8
cleanup invalid prints ( #18050 )
...
Signed-off-by: calvin chen <120380290@qq.com>
2025-05-12 23:01:57 -07:00
Chauncey
dc1a821768
[Feature][V1] Support tool_choice: required when using Xgrammar as the StructuredOutputBackend. ( #17845 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-05-12 23:01:31 -07:00
Cyrus Leung
61e0a506a3
[Bugfix] Avoid repeatedly creating dummy data during engine startup ( #17935 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-12 22:40:19 -07:00
Michael Goin
1df491c522
[Bugfix] Fixes for new marlin moe usage ( #18017 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-13 03:50:04 +00:00
Arjun Kathuria
d8487ef557
[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 ( #13779 )
...
Signed-off-by: Arjun Kathuria <arjun.kathuria8@gmail.com>
2025-05-12 20:36:33 -07:00
Jee Jee Li
c06af9a959
[Misc] Slight spelling modification ( #18039 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-12 20:36:27 -07:00
Tao He
60f7624334
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support ( #11844 )
2025-05-12 19:52:47 -07:00
hissu-hyvarinen
f6518b2b48
[ROCm] Skip tests for quantizations incompatible with ROCm ( #17905 )
...
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
2025-05-12 18:39:28 -06:00
Harry Mellor
d67085c2c8
Remove noisy warnings from SchedulerConfig ( #17995 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 00:33:45 +00:00
Michael Goin
307939f299
Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 ( #18000 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
2025-05-12 18:07:34 -06:00
Harry Mellor
9d7ea9dbbf
Update some more deprecated type hinting ( #17998 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-12 23:49:33 +00:00