Cody Yu
|
54aa619459
|
[V1] Refactor num_computed_tokens logic (#15307)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-27 04:54:36 +00:00 |
|
Mengqing Cao
|
fb22be5817
|
[moe][quant] add weight name case for offset (#15515)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-03-27 04:50:29 +00:00 |
|
Wei Zeng
|
7f301dd8ef
|
[Doc] Update V1 user guide for fp8 kv cache support (#15585)
Signed-off-by: weizeng <weizeng@roblox.com>
|
2025-03-26 19:39:03 -07:00 |
|
Varun Sundar Rabindranath
|
8095341a01
|
[misc] LoRA: Remove unused long context test data (#15558)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-27 10:04:51 +08:00 |
|
Chenyaaang
|
69db16a46a
|
add platform check back (#15578)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
|
2025-03-27 01:50:27 +00:00 |
|
Michael Goin
|
ce78f9af4e
|
Add automatic tpu label to mergify.yml (#15560)
|
2025-03-26 21:39:58 -04:00 |
|
ElizaWszola
|
9239bf718e
|
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
|
2025-03-27 00:54:44 +00:00 |
|
Matthew Vine
|
7a6d45bc8a
|
Support FIPS enabled machines with MD5 hashing (#15299)
Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com>
|
2025-03-26 20:19:46 -04:00 |
|
Chengji Yao
|
e74ff409e0
|
[TPU] support disabling xla compilation cache (#15567)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-03-27 00:09:28 +00:00 |
|
Wes
|
7a888271f5
|
Use Cache Hinting for fused_moe kernel (#15511)
|
2025-03-26 23:21:34 +00:00 |
|
Alexander Matveev
|
9d119a86ae
|
[V1] TPU CI - Fix test_compilation.py (#15570)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-26 21:51:54 +00:00 |
|
Alexander Matveev
|
b2e85e26f4
|
[V1] TPU - Revert to exponential padding by default (#15565)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-26 21:35:05 +00:00 |
|
Alexei-V-Ivanov-AMD
|
dd8a29da99
|
Applying some fixes for K8s agents in CI (#15493)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-03-26 20:35:11 +00:00 |
|
marko
|
27df5199d9
|
Support SHA256 as hash function in prefix caching (#15297)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-03-26 11:11:28 -07:00 |
|
Nick Hill
|
35fad35a48
|
[V1][Sampler] Faster top-k only implementation (#15478)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-26 10:56:47 -07:00 |
|
Aaron Pham
|
733e7c9e95
|
[Refactor] Remove unnecessary backend parameter in structured output interface (#15317)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-26 17:51:56 +00:00 |
|
Harry Mellor
|
0af4d764d6
|
Fix weight loading for some models in Transformers backend (#15544)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-26 10:17:53 -07:00 |
|
youkaichao
|
e64afa455c
|
multi-node offline DP+EP example (#15484)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-26 23:54:24 +08:00 |
|
Alex Brooks
|
1711b929b6
|
[Model] Add Reasoning Parser for Granite Models (#14202)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
|
2025-03-26 14:28:07 +00:00 |
|
Harry Mellor
|
c091c0a588
|
Improve validation of TP in Transformers backend (#15540)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-26 07:26:48 -07:00 |
|
cyyever
|
1aa162e030
|
Apply torchfix (#15532)
Signed-off-by: cyy <cyyever@outlook.com>
|
2025-03-26 12:09:06 +00:00 |
|
Harry Mellor
|
cf5c8f1686
|
Separate base model from TransformersModel (#15467)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-03-26 18:13:38 +08:00 |
|
Reid
|
4ec2cee000
|
[Misc] improve example script output (#15528)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-03-26 10:12:47 +00:00 |
|
wwl2755
|
99f536f830
|
[Misc] Enhance warning information to user-defined chat template (#15408)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-03-26 02:21:15 -07:00 |
|
vllmellm
|
5ebf66748b
|
[FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-26 16:30:30 +08:00 |
|
Bryan Lu
|
781d056280
|
[Feature] Enhance EAGLE Architecture with Proper RMS Norms (#14990)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-26 08:24:07 +00:00 |
|
daniel-salib
|
5aefd6ac31
|
Fix raw_request extraction in load_aware_call decorator (#15382)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
|
2025-03-25 22:29:54 -07:00 |
|
Varun Sundar Rabindranath
|
6c663dfd5e
|
[misc] LoRA - Skip LoRA kernels when not required (#15152)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-26 11:33:45 +08:00 |
|
Lucas Wilkinson
|
33437bc6e7
|
[BugFix] Fix nightly MLA failure (FA2 + MLA chunked prefill, i.e. V1, producing bad results) (#15492)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
|
2025-03-25 20:33:22 -07:00 |
|
Tyler Michael Smith
|
23114d3364
|
[Misc] Warn about v0 in benchmark_paged_attn.py (#15495)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-25 20:31:04 -07:00 |
|
Cyrus Leung
|
997c8811d6
|
[Model] Support multi-image for Molmo (#15438)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-26 11:26:33 +08:00 |
|
Harry Mellor
|
e42389f9d7
|
Transformers backend already supports V1 (#15463)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-25 20:26:16 -07:00 |
|
Varun Sundar Rabindranath
|
ff38f0a32c
|
[CI/Build] LoRA: Delete long context tests (#15503)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-25 17:18:34 -07:00 |
|
Varun Sundar Rabindranath
|
a5cfbab3c8
|
[Core] LoRA: V1 Scheduler optimization (#15422)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-25 22:50:09 +00:00 |
|
Chenyaaang
|
ac3cd6e83c
|
[core] add bucket padding to tpu_model_runner (#14995)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-25 17:27:22 -04:00 |
|
Lu Fang
|
082ab86f5f
|
[V1] Support long_prefill_token_threshold in v1 scheduler (#15419)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-25 14:22:26 -07:00 |
|
Nick Hill
|
6aa196c8dc
|
[V1][Minor] Use SchedulerInterface type for engine scheduler field (#15499)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-25 14:21:36 -07:00 |
|
Nicolò Lucchesi
|
a0dd7dcd49
|
[TPU][V1] Fix Sampler recompilation (#15309)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-25 16:43:54 -04:00 |
|
Maximilien de Bayser
|
e977c11111
|
Add workaround for shared field_names in pydantic model class (#13925)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-03-25 20:31:08 +00:00 |
|
Joe Runde
|
5f063a80bd
|
[bugfix] add supports_v1 platform interface (#15417)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-25 15:00:32 -04:00 |
|
Antonio Gómez
|
5d8e1c9279
|
[Bugfix] Support triton==3.3.0+git95326d9f for RTX 5090 (Unsloth + vLLM compatibility) (#15471)
Co-authored-by: ServerAI <ai@exc-mad-ai.com>
|
2025-03-25 17:59:25 +00:00 |
|
yarongmu-google
|
0a049c7d86
|
[CI/Build] Add tests for the V1 tpu_model_runner. (#14843)
Signed-off-by: Yarong Mu <ymu@google.com>
|
2025-03-25 12:27:16 -04:00 |
|
youkaichao
|
d0cfec7ab9
|
[bugfix] fix inductor cache on max_position_embeddings (#15436)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-25 07:05:39 -07:00 |
|
Szymon Ożóg
|
a608160027
|
[Kernel] Fix conflicting macro names for gguf kernels (#15456)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-03-25 13:50:49 +00:00 |
|
Cyrus Leung
|
3f04a7fbf2
|
[Doc] Update V1 user guide for multi-modality (#15460)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-25 11:01:58 +00:00 |
|
Cyrus Leung
|
5994430b84
|
[Misc] Remove redundant num_embeds (#15443)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-25 18:27:57 +08:00 |
|
Cyrus Leung
|
a9e879b316
|
[Misc] Clean up MiniCPM-V/O code (#15337)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-25 10:22:52 +00:00 |
|
Md. Shafi Hussain
|
3e2f37a69a
|
Dockerfile.ppc64le changes to move to UBI (#15402)
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
|
2025-03-25 10:15:14 +00:00 |
|
Thien Tran
|
4f044b1d67
|
[Kernel][CPU] CPU MLA (#14744)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-03-25 09:34:59 +00:00 |
|
Siyuan Liu
|
4157f563b4
|
[Hardware][TPU][Bugfix] Fix v1 mp profiler (#15409)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-03-25 01:43:00 -07:00 |
|