Travis Johnson
01b6f9e1f0
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding ( #8047 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-09-24 17:29:56 -07:00
afeldman-nm
428dd1445e
[Core] Logprobs support in Multi-step ( #7652 )
2024-08-29 19:19:08 -07:00
Jonas M. Kübler
f205c09854
[Bugfix] Unify rank computation across regular decoding and speculative decoding ( #7899 )
2024-08-28 22:18:13 -07:00
Nick Hill
1856aff4d6
[Spec Decoding] Streamline batch expansion tensor manipulation ( #7851 )
2024-08-25 15:45:14 -07:00
Travis Johnson
cc0eaf12b1
[Bugfix] spec decode handle None entries in topk args in create_sequence_group_output ( #7232 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-08-22 09:33:48 -04:00
Abhinav Goyal
312f761232
[Speculative Decoding] Fixing hidden states handling in batch expansion ( #7508 )
2024-08-19 17:58:14 -07:00
Cade Daniel
82a1b1a82b
[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification ( #6963 )
2024-08-05 08:46:44 +00:00
sroy745
14f91fe67c
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. ( #6485 )
2024-07-20 23:58:58 -07:00
Joshua Rosenkranz
b12518d3cf
[Model] MLPSpeculator speculative decoding support ( #4947 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Davis Wertheimer <Davis.Wertheimer@ibm.com>
2024-06-20 20:23:12 -04:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
Nick Hill
a008629807
[Misc] Various simplifications and typing fixes ( #5368 )
2024-06-11 10:29:02 +08:00
Chang Su
e254497b66
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API ( #3734 )
2024-05-11 11:30:37 -07:00
Cade Daniel
ab50275111
[Speculative decoding] Support target-model logprobs ( #4378 )
2024-05-03 15:52:01 -07:00
leiwen83
b38e42fbca
[Speculative decoding] Add ngram prompt lookup decoding ( #4237 )
...
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
2024-05-01 11:13:03 -07:00
Cade Daniel
e95cd87959
[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine ( #3894 )
2024-04-16 13:09:21 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
Cade Daniel
8437bae6ef
[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling ( #3103 )
2024-03-08 23:32:46 -08:00