Chenyaaang
ac3cd6e83c
[core] add bucket padding to tpu_model_runner ( #14995 )
...
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-25 17:27:22 -04:00
Lu Fang
082ab86f5f
[V1] Support long_prefill_token_threshold in v1 scheduler ( #15419 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-25 14:22:26 -07:00
yarongmu-google
0a049c7d86
[CI/Build] Add tests for the V1 tpu_model_runner. ( #14843 )
...
Signed-off-by: Yarong Mu <ymu@google.com>
2025-03-25 12:27:16 -04:00
Russell Bryant
a09ad90a72
[V1] guidance backend for structured output + auto fallback mode ( #14779 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
2025-03-24 21:02:33 -07:00
Woosuk Kwon
ebcebeeb6b
[V1][Spec Decode] Enable spec decode for top-p & top-k sampling ( #15063 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-24 17:16:46 -07:00
Nick Hill
9d72daf4ce
[V1][Perf] Simpler request output queues ( #15156 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-24 22:44:08 +00:00
Woosuk Kwon
b9bd76ca14
[V1][Spec Decode] Respect prompt_lookup_max ( #15348 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-23 10:41:44 -07:00
shangmingc
50c9636d87
[V1][Usage] Refactor speculative decoding configuration and tests ( #14434 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-22 19:28:10 -10:00
Russell Bryant
eb63ea1e18
[V1] Add disable-any-whitespace option support for xgrammar ( #15316 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-22 15:56:17 +00:00
Nicolò Lucchesi
cfbb8c930f
[TPU][V1] MHA Pallas backend ( #15288 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-21 08:50:39 -07:00
Chen Zhang
93a00d7dde
[v1] Refactor KVCacheConfig ( #14079 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-03-21 04:56:27 -07:00
Hyesoo Yang
47195057e9
[V1][TPU] Speed up top-k on TPU by using torch.topk ( #15242 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
2025-03-20 19:19:40 -07:00
Woosuk Kwon
0c6f5023c3
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface ( #15250 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-20 17:50:43 -07:00
Jason
d8e82bc06d
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id ( #15043 )
...
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>
2025-03-20 10:01:02 -07:00
Nicolò Lucchesi
d8c6d7d6b5
[V1][TPU] Support V1 Sampler for ragged attention ( #14227 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-19 21:00:39 -07:00
Murali Andoorveedu
61c7a1b856
[V1] Minor V1 async engine test refactor ( #15075 )
...
Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
2025-03-19 10:37:17 -07:00
Cyrus Leung
f690372b68
[Core] Update dtype detection and defaults ( #14858 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-19 13:49:33 +08:00
Alexander Matveev
72a8639b68
[V1] TPU - CI/CD use smaller model ( #15054 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-18 21:39:21 +00:00
Woosuk Kwon
99abb8b650
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels ( #14930 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-18 14:31:54 -07:00
Aaron Pham
c0efdd655b
[Fix][Structured Output] using vocab_size to construct matcher ( #14868 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-03-17 11:42:45 -04:00
vllmellm
2bb0e1a799
[Bugfix][ROCm] running new process using spawn method for rocm in tests. ( #14810 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-17 11:33:35 +00:00
Lily Liu
8d6cf89526
[V1] [Spec Decode] Support random sampling for spec decode ( #13933 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-16 22:00:20 -07:00
Sibi
a73e183e36
[Misc] Replace os environ to monkeypatch in test suite ( #14516 )
...
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-16 20:35:57 -07:00
Robert Shaw
aecc780dba
[V1] Enable Entrypoints Tests ( #14903 )
2025-03-16 17:56:16 -07:00
Nick Hill
fc1f67715d
[BugFix][V1] Fix overhead related to bad_words sampling when not in use ( #14894 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-16 14:53:34 -07:00
Lily Liu
d1ad2a57af
[V1] [Spec Decode] Fix ngram tests ( #14878 )
2025-03-16 00:29:22 -07:00
Robert Shaw
d4d93db2c5
[V1] V1 Enablement Oracle ( #13726 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-03-14 22:02:20 -07:00
Russell Bryant
46f98893dd
[V1] Fix model parameterization for structured output tests ( #14833 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-14 20:55:18 +00:00
afeldman-nm
02fcaa3d0a
[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output ( #14624 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
2025-03-13 19:07:34 +00:00
Nick Hill
f5d3acd474
[BugFix][V1] Fix parallel sampling finishing/aborts ( #14512 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-12 10:29:48 -07:00
Benjamin Chislett
5c538c37b2
[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing ( #14645 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-03-11 22:12:41 -07:00
Aaron Pham
77a318bd01
[V1][Core] Support MistralTokenizer for Structured Output ( #14625 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-12 10:40:09 +08:00
Russell Bryant
4bf82d4b90
[V1] Add regex structured output support with xgrammar ( #14590 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-11 23:03:44 +08:00
22quinn
eb8b5eb183
[V1] Support bad_words in sampler ( #13376 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-08 14:50:26 -08:00
Alexander Matveev
cb8bdfade2
[V1] TPU - Add tensor parallel support via Ray ( #13618 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-08 08:19:38 -05:00
afeldman-nm
ef64044079
[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC ( #13949 )
2025-03-08 01:48:12 +00:00
Nick Hill
8ed5421aaa
[V1] Eagerly remove finished requests from the batch ( #14388 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-07 10:56:00 -08:00
Aaron Pham
80e9afb5bc
[V1][Core] Support for Structured Outputs ( #12388 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-07 07:19:11 -08:00
Himanshu Jaju
cd579352bf
[V1] Do not detokenize if sampling param detokenize is False ( #14224 )
...
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-06 10:40:24 -08:00
Harry Mellor
bf0560bda9
Reinstate best_of for V0 ( #14356 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-06 08:34:22 -08:00
Lucas Wilkinson
f6bb18fd9a
[BugFix] MLA + V1, illegal memory access and accuracy issues ( #14253 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-05 17:10:13 -08:00
Lu Fang
53ea6ad830
[V1][Easy] Add empty allowed_token_ids in the v1 sampler test ( #14308 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-05 21:41:18 +00:00
Vincent
a4f1ee35d6
Deprecate best_of Sampling Parameter in anticipation for vLLM V1 ( #13997 )
...
Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-05 20:22:43 +00:00
Robert Shaw
257e200a25
[V1][Frontend] Add Testing For V1 Runtime Parameters ( #14159 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-03-05 14:18:55 +00:00
Nick Hill
5db6b2c961
[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs ( #13869 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-04 15:06:47 +00:00
Harry Mellor
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
Chen Zhang
28943d36ce
[v1] Move block pool operations to a separate class ( #13973 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2025-02-28 20:53:31 +00:00
Chen Zhang
e7bd944e08
[v1] Cleanup the BlockTable in InputBatch ( #13977 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-02-28 19:03:16 +00:00
Lily Liu
5629f26df7
[V1][Spec Decode] Change Spec Decode Rejection Sampling API ( #13729 )
2025-02-25 18:14:48 -08:00
afeldman-nm
befc402d34
[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) ( #10980 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-02-24 08:29:41 -08:00