Gregory Shtrasberg
e97f802b2d
[FP8][Kernel] Dynamic kv cache scaling factors computation ( #11906 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-01-23 18:04:03 +00:00
youkaichao
6e650f56a1
[torch.compile] decouple compile sizes and cudagraph sizes ( #12243 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-24 02:01:30 +08:00
youkaichao
3f50c148fd
[core] add wake_up doc and some sanity check ( #12361 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-24 02:00:50 +08:00
Isotr0py
8c01b8022c
[Bugfix] Fix broken internvl2 inference with v1 ( #12360 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-23 17:20:33 +00:00
Roger Wang
99d01a5e3d
[V1] Simplify M-RoPE ( #12352 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: imkero <kerorek@outlook.com>
2025-01-23 23:13:23 +08:00
Lucas Wilkinson
978b45f399
[Kernel] Flash Attention 3 Support ( #12093 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-01-23 06:45:48 -08:00
Isotr0py
c5b4b11d7f
[Bugfix] Fix k_proj's bias for whisper self attention ( #12342 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-23 10:15:33 +00:00
Cody Yu
f0ef37233e
[V1] Add uncache_blocks ( #12333 )
2025-01-23 04:19:21 +00:00
rasmith
68c4421b6d
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD ( #12282 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2025-01-23 00:10:37 +00:00
Nick Hill
aea94362c9
[Frontend][V1] Online serving performance improvements ( #12287 )
2025-01-22 22:22:12 +00:00
Cody Yu
7206ce4ce1
[Core] Support reset_prefix_cache ( #12284 )
2025-01-22 18:52:27 +00:00
Konrad Zawora
96f6a7596f
[Bugfix] Fix HPU multiprocessing executor ( #12167 )
...
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
2025-01-23 02:07:07 +08:00
Jee Jee Li
84bee4bd5c
[Misc] Improve the readability of BNB error messages ( #12320 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-22 16:56:54 +00:00
Robin
fc66dee76d
[Misc] Fix the error in the tip for the --lora-modules parameter ( #12319 )
...
Signed-off-by: wangerxiao <863579016@qq.com>
2025-01-22 16:48:41 +00:00
Cyrus Leung
6609cdf019
[Doc] Add docs for prompt replacement ( #12318 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-22 14:56:29 +00:00
Roger Wang
16366ee8bb
[Bugfix][VLM] Fix mixed-modality inference backward compatibility for V0 ( #12313 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-01-22 21:06:36 +08:00
zhou fan
528dbcac7d
[Model][Bugfix]: correct Aria model output ( #12309 )
...
Signed-off-by: xffxff <1247714429@qq.com>
2025-01-22 11:39:19 +00:00
Cyrus Leung
cd7b6f0857
[VLM] Avoid unnecessary tokenization ( #12310 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-22 11:08:31 +00:00
youkaichao
68ad4e3a8d
[Core] Support fully transparent sleep mode ( #11743 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-22 14:39:32 +08:00
youkaichao
66818e5b63
[core] separate builder init and builder prepare for each batch ( #12253 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-22 14:13:52 +08:00
Cyrus Leung
cbdc4ad5a5
[Ci/Build] Fix mypy errors on main ( #12296 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-22 12:06:54 +08:00
Kevin H. Luu
64ea24d0b3
[ci/lint] Add back default arg for pre-commit ( #12279 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2025-01-22 01:15:27 +00:00
Cyrus Leung
df76e5af26
[VLM] Simplify post-processing of replacement info ( #12269 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-21 16:48:13 -08:00
Aleksandr Malyshev
69196a9bc7
[BUGFIX] When skip_tokenize_init and multistep are set, execution crashes ( #12277 )
...
Signed-off-by: maleksan85 <maleksan@amd.com>
Co-authored-by: maleksan85 <maleksan@amd.com>
2025-01-21 23:30:46 +00:00
Jani Monoses
9c485d9e25
[Core] Free CPU pinned memory on environment cleanup ( #10477 )
2025-01-21 11:56:41 -08:00
wangxiyuan
fa9ee08121
[Misc] Set default backend to SDPA for get_vit_attn_backend ( #12235 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-01-21 11:52:11 -08:00
Adrian Cole
347eeebe3b
[Misc] Remove experimental dep from tracing.py ( #12007 )
...
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
2025-01-21 11:51:55 -08:00
Andy Lo
18fd4a8331
[Bugfix] Multi-sequence broken ( #11898 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-01-21 11:51:35 -08:00
Ricky Xu
132a132100
[v1][stats][1/n] Add RequestStatsUpdate and RequestStats types ( #10907 )
...
Signed-off-by: rickyx <rickyx@anyscale.com>
2025-01-21 11:51:13 -08:00
Jannis Schönleber
9705b90bcf
[Bugfix] fix race condition that leads to wrong order of token returned ( #10802 )
...
Signed-off-by: Jannis Schönleber <joennlae@gmail.com>
2025-01-21 09:47:04 -08:00
Mengqing Cao
c64612802b
[Platform] improve platforms getattr ( #12264 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-01-21 14:42:41 +00:00
Roger Wang
b197a5ccfd
[V1][Bugfix] Fix data item ordering in mixed-modality inference ( #12259 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-01-21 13:18:43 +00:00
youkaichao
c81081fece
[torch.compile] transparent compilation with more logging ( #12246 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-21 19:32:55 +08:00
Cyrus Leung
a94eee4456
[Bugfix] Fix mm_limits access for merged multi-modal processor ( #12252 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-21 10:09:39 +00:00
Cyrus Leung
f2e9f2a3be
[Misc] Remove redundant TypeVar from base model ( #12248 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-21 08:40:39 +00:00
Jee Jee Li
1f1542afa9
[Misc]Add BNB quantization for PaliGemmaForConditionalGeneration ( #12237 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-21 07:49:08 +00:00
Cyrus Leung
96912550c8
[Misc] Rename MultiModalInputsV2 -> MultiModalInputs ( #12244 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-21 07:31:19 +00:00
Nicolò Lucchesi
5fe6bf29d6
[BugFix] Fix GGUF tp>1 when vocab_size is not divisible by 64 ( #12230 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-01-21 12:23:14 +08:00
Gregory Shtrasberg
d4b62d4641
[AMD][Build] Porting dockerfiles from the ROCm/vllm fork ( #11777 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-01-21 12:22:23 +08:00
Jinzhen Lin
750f4cabfa
[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) ( #12222 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-20 16:42:16 -08:00
Cheng Kuan Yong Jason
06a760d6e8
[bugfix] catch xgrammar unsupported array constraints ( #12210 )
...
Signed-off-by: Jason Cheng <jasoncky96@gmail.com>
2025-01-20 16:42:02 -08:00
youkaichao
da7512215f
[misc] add cuda runtime version to usage data ( #12190 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-01-21 00:31:01 +00:00
Işık
af69a6aded
fix: update platform detection for M-series arm based MacBook processors ( #12227 )
...
Signed-off-by: isikhi <huseyin.isik000@gmail.com>
2025-01-20 22:23:28 +00:00
wangxiyuan
86bfb6dba7
[Misc] Pass attention to impl backend ( #12218 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-01-20 23:25:28 +08:00
Chen Zhang
5f0ec3935a
[V1] Remove _get_cache_block_size ( #12214 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-20 21:54:16 +08:00
youkaichao
c222f47992
[core][bugfix] configure env var during import vllm ( #12209 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-20 19:35:59 +08:00
Cyrus Leung
b37d82791e
[Model] Upgrade Aria to transformers 4.48 ( #12203 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 17:58:48 +08:00
Cyrus Leung
59a0192fb9
[Core] Interface for accessing model from VllmRunner ( #10353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 15:00:59 +08:00
Isotr0py
83609791d2
[Model] Add Qwen2 PRM model support ( #12202 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-20 14:59:46 +08:00
Yuan Tang
0974c9bc5c
[Bugfix] Fix incorrect types in LayerwiseProfileResults ( #12196 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-20 14:59:20 +08:00