courage17340
|
b1308b84a3
|
[Model][VLM] Add Kimi-VL model support (#16387)
Signed-off-by: courage17340 <courage17340@163.com>
|
2025-04-14 21:41:48 +00:00 |
|
Nicolò Lucchesi
|
b3f2fddd17
|
[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 (#16596)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-14 17:01:05 +00:00 |
|
Cyrus Leung
|
aa29841ede
|
[Bugfix] Multi-modal caches not acting like LRU caches (#16593)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-14 09:24:16 -07:00 |
|
shangmingc
|
1dd23386ec
|
[Misc] Update usage with mooncake lib for kv transfer (#16523)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-04-14 11:31:37 +00:00 |
|
DefTruth
|
ce4ddd2d1a
|
[Misc] remove warning if triton>=3.2.0 (#16553)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-14 02:39:47 -07:00 |
|
Harry Mellor
|
e51929ebca
|
Improve configs - SchedulerConfig (#16533)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-14 17:24:16 +08:00 |
|
Russell Bryant
|
dc1b4a6f13
|
[Core][V0] Enable regex support with xgrammar (#13228)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-14 10:13:38 +08:00 |
|
Michael Goin
|
d085a44082
|
Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (#16537)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-13 14:55:18 +00:00 |
|
Lily Liu
|
f49e5aff11
|
[V1][Spec Decode] KV cache slots for eagle heads (#16370)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-04-12 19:42:51 -07:00 |
|
Ryan McConville
|
6c11ecf8d3
|
[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (#16529)
Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>
|
2025-04-12 20:19:19 +00:00 |
|
SnowCharm
|
93e5f3c5fb
|
[Perf] Optimize Preparing Inputs for GPU Model Runner (#16484)
Signed-off-by: snowcharm <snowcharmqq@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-12 22:54:37 +08:00 |
|
Jie Fu (傅杰)
|
70363bccfa
|
Fix syntaxWarning: invalid escape sequence '\s' (#16532)
Signed-off-by: Jie Fu <jiefu@tencent.com>
|
2025-04-12 14:39:42 +00:00 |
|
Huazhong Ji
|
68bb122eb4
|
[MISC] Make GroupCoordinator compatible with out-of-tree devices (#16464)
Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>
|
2025-04-12 09:20:25 +00:00 |
|
Cyrus Leung
|
d9fc8cd9da
|
[V1] Enable multi-input by default (#15799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-12 08:52:39 +00:00 |
|
wang.yuqi
|
fbf722c6e6
|
[Frontend] support matryoshka representation / support embedding API dimensions (#16331)
|
2025-04-11 23:23:10 -07:00 |
|
leon-seidel
|
e92d7085bf
|
[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516)
Signed-off-by: Leon Seidel <leon.seidel@fau.de>
|
2025-04-11 23:22:07 -07:00 |
|
Michael Goin
|
bd6028d6b0
|
Optimized topk for topk=1 (Llama-4) (#16512)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-12 14:21:08 +08:00 |
|
Nick Hill
|
41cc883c29
|
[BugFix] Handle non-contiguous tensors properly when serializing (#16492)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-11 17:54:06 -07:00 |
|
Michael Goin
|
87b836ba77
|
Bugfix for PixtralHF models without spatial_merge_size (#16513)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 23:32:22 +00:00 |
|
rongfu.leng
|
56c76c2e0e
|
[Bugfix] clean up duplicated code (#16485)
Signed-off-by: Gogs <gogs@fake.local>
Co-authored-by: Gogs <gogs@fake.local>
|
2025-04-11 23:19:40 +00:00 |
|
Yong Hoon Shin
|
a3bf8d4a2b
|
[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488)
|
2025-04-12 06:26:55 +08:00 |
|
Ye (Charlotte) Qi
|
16eda8c43a
|
[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
|
2025-04-12 06:26:17 +08:00 |
|
Harry Mellor
|
cd77382ac1
|
Improve configs - LoadConfig (#16422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-11 20:27:27 +00:00 |
|
Travis Johnson
|
71b9cde010
|
[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-04-11 19:59:50 +00:00 |
|
Michael Goin
|
f41647ee6b
|
[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 17:54:08 +00:00 |
|
Nicolò Lucchesi
|
4d022cbc75
|
[TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models (#16483)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-11 17:06:14 +00:00 |
|
Richard Zou
|
70de35a881
|
Fix erroneous "model doesn't support compile" warning (#16486)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-11 16:24:36 +00:00 |
|
Tomasz Zielinski
|
34b2cf3b33
|
[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779)
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>
|
2025-04-11 07:38:36 -07:00 |
|
chaow-amd
|
9e90c9f73f
|
[Bugfix] Fix bugs of running Quark quantized models (#16236)
Signed-off-by: chaow <chaow@amd.com>
|
2025-04-11 10:18:32 -04:00 |
|
DefTruth
|
e9528f6dc6
|
[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-11 06:50:50 -06:00 |
|
Jee Jee Li
|
a26f59ccbc
|
[Misc] Raise error for V1 not supporting Long LoRA. (#16415)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-11 01:51:20 -07:00 |
|
Michael Goin
|
aa3b3d76e0
|
Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 08:09:52 +00:00 |
|
Jee Jee Li
|
f7030df3be
|
[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-11 15:32:37 +08:00 |
|
DefTruth
|
905e91e9ac
|
Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453)
|
2025-04-11 06:44:22 +00:00 |
|
Alex Brooks
|
f8f9c0ba62
|
[Bugfix] Don't set an upper bound on repetition penalty (#16403)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-11 14:19:40 +08:00 |
|
Yong Hoon Shin
|
99ef59cf7f
|
[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-04-10 21:26:07 -07:00 |
|
Nicolò Lucchesi
|
3cc9af88ff
|
[TPU][V1] Disable per-request seed/Generator (#16172)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-10 17:05:44 -04:00 |
|
Cyrus Leung
|
56d4aefa33
|
[VLM] Avoid unnecessary dummy multimodal data during processing (#16416)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-10 19:32:14 +00:00 |
|
Nick Hill
|
dd143ef541
|
[V1] Zero-copy tensor/ndarray serialization/transmission (#13790)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-10 19:23:14 +00:00 |
|
Chih-Chieh Yang
|
daefed052c
|
[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (#15423)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-04-10 19:07:07 +00:00 |
|
Lily Liu
|
e8224f3dca
|
[V1][Spec Decode] Eagle Model loading (#16035)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-04-10 11:21:48 -07:00 |
|
Russell Bryant
|
9665313c39
|
[V1] Set structured output backend to auto by default (#15724)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-10 17:53:26 +00:00 |
|
Harry Mellor
|
0c54fc7273
|
Improve configs - ParallelConfig (#16332)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-10 17:34:37 +00:00 |
|
Nicolò Lucchesi
|
c1b57855ec
|
[TPU][V1] Use language_model interface for getting text backbone in MM (#16410)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-10 17:32:04 +00:00 |
|
Cyrus Leung
|
83b824c8b4
|
[VLM] Remove BaseProcessingInfo.get_mm_max_tokens_per_item (#16408)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-10 09:06:58 -07:00 |
|
Lu Fang
|
7678fcd5b6
|
Fix the torch version parsing logic (#15857)
|
2025-04-10 07:37:47 -07:00 |
|
Ye (Charlotte) Qi
|
61de3ef74b
|
[Model] Remove image mm limit for LLaMa4 (#16365)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-04-10 09:36:27 +00:00 |
|
Michael Goin
|
c70cf0fe06
|
[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (#16038)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-10 15:08:47 +08:00 |
|
Cyrus Leung
|
a5d11a54dc
|
[Bugfix] Fix validation error for text-only Mllama 3.2 (#16377)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-10 14:19:42 +08:00 |
|
Aaron Ang
|
a9bd832fc5
|
[Model] use AutoWeightsLoader for deepseek_v2, internlm2 (#16383)
Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>
|
2025-04-09 23:01:00 -07:00 |
|