Alexander Matveev
9a2160fa55
[V1] TPU CI - Add basic perf regression test ( #15414 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-31 13:25:20 -04:00
shangmingc
239b7befdd
[V1][Spec Decode] Remove deprecated spec decode config params ( #15466 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-31 09:19:35 -07:00
Julien Denize
6909a76201
[Bugfix] Fix Mistral guided generation using xgrammar ( #15704 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
2025-03-29 20:20:19 -07:00
Chauncey
045533716b
[CI] xgrammar structured output supports Enum. ( #15757 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-03-29 20:20:02 -07:00
Russell Bryant
7a7992085b
[CI] Speed up V1 structured output tests ( #15718 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-28 21:10:45 -07:00
Alexander Matveev
c3f687ac22
[V1] TPU - Fix the chunked prompt bug ( #15713 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-28 20:19:04 +00:00
Russell Bryant
7329ff5468
[V1] Support disable_any_whtespace for guidance backend ( #15584 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-28 23:46:45 +08:00
Chauncey
3b00ff9138
[Bugfix][v1] xgrammar structured output supports Enum. ( #15594 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-03-28 06:14:53 -07:00
Robert Shaw
2d9045fce8
[TPU][CI] Fix TPUModelRunner Test ( #15667 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-03-28 00:01:26 -07:00
Robert Shaw
8a49eea74b
[CI][TPU] Temporarily Disable Quant Test on TPU ( #15649 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-27 19:45:05 -07:00
Nick Hill
15dac210f0
[V1] AsyncLLM data parallel ( #13923 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-27 16:14:41 -07:00
Nicolò Lucchesi
4098b72210
[Bugfix][TPU][V1] Fix recompilation ( #15553 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-27 19:15:06 +00:00
Cody Yu
54aa619459
[V1] Refactor num_computed_tokens logic ( #15307 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-27 04:54:36 +00:00
marko
27df5199d9
Support SHA256 as hash function in prefix caching ( #15297 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
2025-03-26 11:11:28 -07:00
Nick Hill
35fad35a48
[V1][Sampler] Faster top-k only implementation ( #15478 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-26 10:56:47 -07:00
Chenyaaang
ac3cd6e83c
[core] add bucket padding to tpu_model_runner ( #14995 )
...
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-25 17:27:22 -04:00
Lu Fang
082ab86f5f
[V1] Support long_prefill_token_threshold in v1 scheduler ( #15419 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-25 14:22:26 -07:00
yarongmu-google
0a049c7d86
[CI/Build] Add tests for the V1 tpu_model_runner. ( #14843 )
...
Signed-off-by: Yarong Mu <ymu@google.com>
2025-03-25 12:27:16 -04:00
Russell Bryant
a09ad90a72
[V1] guidance backend for structured output + auto fallback mode ( #14779 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
2025-03-24 21:02:33 -07:00
Woosuk Kwon
ebcebeeb6b
[V1][Spec Decode] Enable spec decode for top-p & top-k sampling ( #15063 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-24 17:16:46 -07:00
Nick Hill
9d72daf4ce
[V1][Perf] Simpler request output queues ( #15156 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-24 22:44:08 +00:00
Woosuk Kwon
b9bd76ca14
[V1][Spec Decode] Respect prompt_lookup_max ( #15348 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-23 10:41:44 -07:00
shangmingc
50c9636d87
[V1][Usage] Refactor speculative decoding configuration and tests ( #14434 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-22 19:28:10 -10:00
Russell Bryant
eb63ea1e18
[V1] Add disable-any-whitespace option support for xgrammar ( #15316 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-22 15:56:17 +00:00
Nicolò Lucchesi
cfbb8c930f
[TPU][V1] MHA Pallas backend ( #15288 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-21 08:50:39 -07:00
Chen Zhang
93a00d7dde
[v1] Refactor KVCacheConfig ( #14079 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-03-21 04:56:27 -07:00
Hyesoo Yang
47195057e9
[V1][TPU] Speed up top-k on TPU by using torch.topk ( #15242 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
2025-03-20 19:19:40 -07:00
Woosuk Kwon
0c6f5023c3
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface ( #15250 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-20 17:50:43 -07:00
Jason
d8e82bc06d
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id ( #15043 )
...
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>
2025-03-20 10:01:02 -07:00
Nicolò Lucchesi
d8c6d7d6b5
[V1][TPU] Support V1 Sampler for ragged attention ( #14227 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-19 21:00:39 -07:00
Murali Andoorveedu
61c7a1b856
[V1] Minor V1 async engine test refactor ( #15075 )
...
Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
2025-03-19 10:37:17 -07:00
Cyrus Leung
f690372b68
[Core] Update dtype detection and defaults ( #14858 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-19 13:49:33 +08:00
Alexander Matveev
72a8639b68
[V1] TPU - CI/CD use smaller model ( #15054 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-18 21:39:21 +00:00
Woosuk Kwon
99abb8b650
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels ( #14930 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-18 14:31:54 -07:00
Aaron Pham
c0efdd655b
[Fix][Structured Output] using vocab_size to construct matcher ( #14868 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-03-17 11:42:45 -04:00
vllmellm
2bb0e1a799
[Bugfix][ROCm] running new process using spawn method for rocm in tests. ( #14810 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-17 11:33:35 +00:00
Lily Liu
8d6cf89526
[V1] [Spec Decode] Support random sampling for spec decode ( #13933 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-16 22:00:20 -07:00
Sibi
a73e183e36
[Misc] Replace os environ to monkeypatch in test suite ( #14516 )
...
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-16 20:35:57 -07:00
Robert Shaw
aecc780dba
[V1] Enable Entrypoints Tests ( #14903 )
2025-03-16 17:56:16 -07:00
Nick Hill
fc1f67715d
[BugFix][V1] Fix overhead related to bad_words sampling when not in use ( #14894 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-16 14:53:34 -07:00
Lily Liu
d1ad2a57af
[V1] [Spec Decode] Fix ngram tests ( #14878 )
2025-03-16 00:29:22 -07:00
Robert Shaw
d4d93db2c5
[V1] V1 Enablement Oracle ( #13726 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-03-14 22:02:20 -07:00
Russell Bryant
46f98893dd
[V1] Fix model parameterization for structured output tests ( #14833 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-14 20:55:18 +00:00
afeldman-nm
02fcaa3d0a
[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output ( #14624 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
2025-03-13 19:07:34 +00:00
Nick Hill
f5d3acd474
[BugFix][V1] Fix parallel sampling finishing/aborts ( #14512 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-12 10:29:48 -07:00
Benjamin Chislett
5c538c37b2
[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing ( #14645 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-03-11 22:12:41 -07:00
Aaron Pham
77a318bd01
[V1][Core] Support MistralTokenizer for Structured Output ( #14625 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-12 10:40:09 +08:00
Russell Bryant
4bf82d4b90
[V1] Add regex structured output support with xgrammar ( #14590 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-11 23:03:44 +08:00
22quinn
eb8b5eb183
[V1] Support bad_words in sampler ( #13376 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-08 14:50:26 -08:00
Alexander Matveev
cb8bdfade2
[V1] TPU - Add tensor parallel support via Ray ( #13618 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-08 08:19:38 -05:00