Robert Shaw
80c751e7f6
[V1] Simplify Shutdown ( #11659 )
2025-01-03 17:25:38 +00:00
Robert Shaw
4fb8e329fd
[V1] [5/N] API Server: unify Detokenizer and EngineCore input ( #11545 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2024-12-28 20:51:57 +00:00
Robert Shaw
df04dffade
[V1] [4/N] API Server: ZMQ/MP Utilities ( #11541 )
2024-12-28 01:45:08 +00:00
sroy745
dcb1a944d4
[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler ( #10681 )
...
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-12-26 19:02:58 +09:00
Cody Yu
bf8717ebae
[V1] Prefix caching for vision language models ( #11187 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2024-12-17 16:37:59 -08:00
Alexander Matveev
4e11683368
[V1] VLM preprocessor hashing ( #11020 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-12-12 00:55:30 +00:00
Alexander Matveev
3bc94cab69
[V1] VLM - Run the mm_mapper preprocessor in the frontend process ( #10640 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-12-03 10:33:10 +00:00
Ricky Xu
d9b4b3f069
[Bug][CLI] Allow users to disable prefix caching explicitly ( #10724 )
...
Signed-off-by: rickyx <rickyx@anyscale.com>
2024-11-27 23:59:28 -08:00
Ricky Xu
519e8e4182
[v1] EngineArgs for better config handling for v1 ( #10382 )
...
Signed-off-by: rickyx <rickyx@anyscale.com>
2024-11-25 21:09:43 -08:00
youkaichao
25d806e953
[misc] add torch.compile compatibility check ( #10618 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-24 23:40:08 -08:00
Woosuk Kwon
112fa0bbe5
[V1] Fix CI tests on V1 engine ( #10272 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-12 16:17:20 -08:00
Robert Shaw
6ace6fba2c
[V1] AsyncLLM Implementation ( #9826 )
...
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-11-11 23:05:38 +00:00