Chengji Yao
3a1d8940ae
[TPU] support fp8 kv cache quantization ( #19292 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-20 03:01:00 +00:00
XiongfeiWei
d4170fad39
Use w8a8 quantized matmul Pallas kernel ( #19170 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-15 03:06:33 +00:00
Chengji Yao
eb58f5953d
[TPU][Bugfix] fix test_pallas ( #20666 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-09 09:32:48 -07:00
QiliangCui
d8ee5a2ca4
[TPU][Bugfix] disable phi-3 test ( #20632 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-08 23:14:26 +00:00
Jee Jee Li
1caca5a589
[Misc] Add SPDX-FileCopyrightText ( #20428 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-04 07:40:42 +00:00
Chengji Yao
7da296be04
[TPU] kv cache update kernel supports dynamic grid ( #20235 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-02 06:33:37 +00:00
Woosuk Kwon
2863befce3
[Optimization] Use Shared CachedRequestData Instance Across All Requests ( #20232 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 09:07:50 -07:00
Chengji Yao
04e1642e32
[TPU] add kv cache update kernel ( #19928 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-06-26 10:01:37 -07:00
Chenyaaang
2d7620c3eb
[TPU] Add TPU specific var VLLM_TPU_MOST_MODEL_LEN ( #19919 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-06-25 15:51:02 -07:00
Chenyaaang
33d5e29be9
[TPU] Fix tpu model runner test ( #19995 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-06-23 16:04:28 -07:00
Chengji Yao
a77aea59fd
[TPU] support attention head dim smaller than 128 ( #19620 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-06-16 06:40:53 +00:00
Nick Hill
646d62f636
[Core] Use tuple for kv cache group block ids ( #19175 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-10 07:01:17 +02:00
Siyuan Liu
7d44c469fe
[TPU]Fix KV cache sharing tests ( #19371 )
2025-06-09 18:38:15 -04:00
Nick Hill
46ecc57973
[BugFix] Fix tpu_model_runner block_id concatenation ( #19228 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-06 16:28:17 -07:00
Siyuan Liu
7ee2590478
[TPU] Update dynamo dump file name in compilation test ( #19108 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-06-04 16:13:43 -04:00
Siyuan Liu
8e972d9c44
[TPU] Skip hanging tests ( #19115 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-06-04 01:43:00 -07:00
Yong Hoon Shin
bdf13965ab
[V1] Support cross-layer KV sharing ( #18212 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-06-03 20:33:07 +00:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Siyuan Liu
9112b443a0
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD ( #18011 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
2025-06-03 00:06:20 +00:00
Carol Zheng
fba02e3bd1
[Bugfix][TPU] Fix tpu model runner testcase failure ( #18810 )
...
Signed-off-by: Carol Zheng <cazheng@google.com>
2025-05-30 18:04:03 +08:00
Jevin Jiang
a463555dee
[TPU] Fix the test_sampler ( #17820 )
2025-05-08 05:51:33 -04:00
Cyrus Leung
8a15c2603a
[Frontend] Add missing chat templates for various MLLMs ( #17758 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-07 00:10:01 -07:00
Nicolò Lucchesi
5941e0b7ea
[TPU][V1] Add support for top-logprobs ( #17072 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-05-05 14:20:15 -07:00
XiongfeiWei
9765940824
[TPU] Enable gemma3-27b with TP>1 on multi-chips. ( #17335 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-05-05 14:19:58 -07:00
Siyuan Liu
dbc18e7816
[CI][TPU] Skip Multimodal test ( #17488 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-04-30 19:51:39 -07:00
Nicolò Lucchesi
a7d5b016bd
[TPU][V1][CI] Update regression test baseline for v6 CI ( #17064 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-30 04:03:22 -07:00
Nick Hill
df6f3ce883
[Core] Remove prompt string from engine core data structures ( #17214 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-25 23:41:05 -07:00
Michael Goin
14288d1332
Disable enforce_eager for V1 TPU sampler and structured output tests ( #17016 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-24 02:50:09 -07:00
Chenyaaang
83d933718c
[Core][V1][TPU] Enable structured decoding on TPU V1 ( #16499 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-04-22 18:05:23 -06:00
Nicolò Lucchesi
fa3bba2a53
[TPU][V1] Enable Top-P ( #16843 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-04-22 00:46:07 +00:00
Nicolò Lucchesi
210207525e
[TPU][V1] Capture multimodal encoder during model compilation ( #15051 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Siyuan Liu <lsiyuan@google.com>
2025-04-21 18:36:59 -06:00
Chengji Yao
471fe65630
[TPU][V1] Implicitly adjust page size when there's SMEM OOM ( #16871 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-21 15:43:13 -06:00
Nicolò Lucchesi
eb5819b2d9
[V1][TPU] Enable Top K ( #15489 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>
2025-04-17 18:18:11 +00:00
Nicolò Lucchesi
5989f4684d
[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even ( #16726 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-17 18:09:57 +00:00
Nicolò Lucchesi
b3f2fddd17
[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 ( #16596 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-14 17:01:05 +00:00
Nicolò Lucchesi
3cc9af88ff
[TPU][V1] Disable per-request seed/Generator ( #16172 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-10 17:05:44 -04:00
Chengji Yao
a454748544
[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues ( #16275 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-09 18:51:51 -06:00
Chengji Yao
b1eb4ca152
[TPU] Update PyTorch/XLA ( #16288 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-09 14:46:32 +08:00
iefgnoix
b6be6f8d1e
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. ( #15732 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-04-03 14:23:28 -07:00
Hyesoo Yang
1b84eff03a
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). ( #15736 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>
2025-04-02 17:18:08 -07:00
Alexander Matveev
9a2160fa55
[V1] TPU CI - Add basic perf regression test ( #15414 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-31 13:25:20 -04:00
Alexander Matveev
c3f687ac22
[V1] TPU - Fix the chunked prompt bug ( #15713 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-28 20:19:04 +00:00
Robert Shaw
2d9045fce8
[TPU][CI] Fix TPUModelRunner Test ( #15667 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-03-28 00:01:26 -07:00
Robert Shaw
8a49eea74b
[CI][TPU] Temporarily Disable Quant Test on TPU ( #15649 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-27 19:45:05 -07:00
Nicolò Lucchesi
4098b72210
[Bugfix][TPU][V1] Fix recompilation ( #15553 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-27 19:15:06 +00:00
Chenyaaang
ac3cd6e83c
[core] add bucket padding to tpu_model_runner ( #14995 )
...
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-25 17:27:22 -04:00
yarongmu-google
0a049c7d86
[CI/Build] Add tests for the V1 tpu_model_runner. ( #14843 )
...
Signed-off-by: Yarong Mu <ymu@google.com>
2025-03-25 12:27:16 -04:00
Nicolò Lucchesi
cfbb8c930f
[TPU][V1] MHA Pallas backend ( #15288 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-21 08:50:39 -07:00
Hyesoo Yang
47195057e9
[V1][TPU] Speed up top-k on TPU by using torch.topk ( #15242 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
2025-03-20 19:19:40 -07:00
Nicolò Lucchesi
d8c6d7d6b5
[V1][TPU] Support V1 Sampler for ragged attention ( #14227 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-19 21:00:39 -07:00