Brendan Wong
168cab6bbf
[Frontend] API support for beam search ( #9087 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-10-05 23:39:03 -07:00
Andy Dai
5df1834895
[Bugfix] Fix order of arguments matters in config.yaml ( #8960 )
2024-10-05 17:35:11 +00:00
Chen Zhang
cfadb9c687
[Bugfix] Deprecate registration of custom configs to huggingface ( #9083 )
2024-10-05 21:56:40 +08:00
Xin Yang
15986f598c
[Model] Support Gemma2 embedding model ( #9004 )
2024-10-05 06:57:05 +00:00
ElizaWszola
05d686432f
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE ( #8973 )
...
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
2024-10-04 12:34:44 -06:00
Flávia Béo
0dcc8cbe5a
Adds truncate_prompt_tokens param for embeddings creation ( #8999 )
...
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
2024-10-04 18:31:40 +00:00
Roger Wang
26aa325f4f
[Core][VLM] Test registration for OOT multimodal models ( #8717 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:38:25 -07:00
Prashant Gupta
9ade8bbc8d
[Model] add a bunch of supported lora modules for mixtral ( #9008 )
...
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
2024-10-04 16:24:40 +00:00
Cyrus Leung
0e36fd4909
[Misc] Move registry to its own file ( #9064 )
2024-10-04 10:01:37 +00:00
Murali Andoorveedu
0f6d7a9a34
[Models] Add remaining model PP support ( #7168 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
代君
3dbb215b38
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model ( #8405 )
2024-10-04 10:36:39 +08:00
sroy745
91add85ec4
Fix failing spec decode test ( #9054 )
2024-10-03 23:07:29 +00:00
youkaichao
9aaf14c62e
[misc] add forward context for attention ( #9029 )
2024-10-03 12:09:42 -07:00
xendo
63e39937f9
[Frontend] [Neuron] Parse literals out of override-neuron-config ( #8959 )
...
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
2024-10-03 18:02:07 +00:00
Guillaume Calmettes
83caf35e08
[BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser ( #9020 )
2024-10-03 16:44:52 +08:00
Shawn Tan
19f0d25796
[Model] Adding Granite MoE. ( #8206 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-03 09:33:57 +08:00
afeldman-nm
563649aafe
[Core] Combined support for multi-step scheduling, chunked prefill & prefix caching ( #8804 )
...
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
2024-10-02 07:52:20 +00:00
Lily Liu
1570203864
[Spec Decode] (1/2) Remove batch expansion ( #8839 )
2024-10-01 16:04:42 -07:00
Isotr0py
bc4eb65b54
[Bugfix] Fix Fuyu tensor parallel inference ( #8986 )
2024-10-01 17:51:41 +08:00
Joe Runde
062c89e7c9
[Frontend][Core] Move guided decoding params into sampling params ( #8252 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-01 09:34:25 +08:00
Lily Liu
bce324487a
[CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. ( #8975 )
2024-10-01 00:51:40 +00:00
Mor Zusman
f13a07b1f8
[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model ( #8533 )
2024-09-29 17:35:58 -04:00
danieljannai21
6c9ba48fde
[Frontend] Added support for HF's new continue_final_message parameter ( #8942 )
2024-09-29 17:59:47 +00:00
Jee Jee Li
3d49776bbb
[Model][LoRA]LoRA support added for MiniCPMV2.5 ( #7199 )
2024-09-29 06:59:45 +00:00
Cyrus Leung
26a68d5d7e
[CI/Build] Add test decorator for minimum GPU memory ( #8925 )
2024-09-29 02:50:51 +00:00
ElizaWszola
d081da0064
[Bugfix] Fix Marlin MoE act order when is_k_full == False ( #8741 )
...
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-09-28 18:19:40 -07:00
sroy745
5bf8789b2a
[Bugfix] Block manager v2 with preemption and lookahead slots ( #8824 )
2024-09-29 09:17:45 +08:00
Cyrus Leung
e1a3f5e831
[CI/Build] Update models tests & examples ( #8874 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-28 09:54:35 -07:00
Varun Sundar Rabindranath
19d02ff938
[Bugfix] Fix PP for Multi-Step ( #8887 )
2024-09-28 08:52:46 -07:00
Varun Sundar Rabindranath
c2ec430ab5
[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path ( #8378 )
...
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-09-27 13:32:07 -07:00
Luka Govedič
172d1cd276
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method ( #7271 )
2024-09-27 14:25:10 -04:00
youkaichao
a9b15c606f
[torch.compile] use empty tensor instead of None for profiling ( #8875 )
2024-09-27 08:11:32 -07:00
Isotr0py
6d792d2f31
[Bugfix][VLM] Fix Fuyu batching inference with max_num_seqs>1 ( #8892 )
2024-09-27 01:15:58 -07:00
Cyrus Leung
3b00b9c26c
[Core] renamePromptInputs and inputs ( #8876 )
2024-09-26 20:35:15 -07:00
Maximilien de Bayser
344cd2b6f4
[Feature] Add support for Llama 3.1 and 3.2 tool use ( #8343 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-09-26 17:01:42 -07:00
Nick Hill
4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade ( #8829 )
2024-09-26 16:46:43 -07:00
Chirag Jain
ee2da3e9ef
fix validation: Only set tool_choice auto if at least one tool is provided ( #8568 )
2024-09-26 16:23:17 -07:00
Chen Zhang
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model ( #8811 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Simon Mo
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility ( #8760 ) ( #8810 )
2024-09-25 10:36:26 -07:00
Michael Goin
873edda6cf
[Misc] Support FP8 MoE for compressed-tensors ( #8588 )
2024-09-25 09:43:36 -07:00
Cyrus Leung
28e1299e60
rename PromptInputs and inputs with backward compatibility ( #8760 )
2024-09-25 09:36:47 -07:00
bnellnm
300da09177
[Kernel] Fullgraph and opcheck tests ( #8479 )
2024-09-25 08:35:52 -06:00
sroy745
fc3afc20df
Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 ( #8752 )
2024-09-24 21:26:36 -07:00
sroy745
ee777d9c30
Fix test_schedule_swapped_simple in test_scheduler.py ( #8780 )
2024-09-24 21:26:18 -07:00
Joe Runde
6e0c9d6bd0
[Bugfix] Use heartbeats instead of health checks ( #8583 )
2024-09-24 20:37:38 -07:00
Travis Johnson
01b6f9e1f0
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding ( #8047 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-09-24 17:29:56 -07:00
Jee Jee Li
13f9f7a3d0
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 ( #8768 )
2024-09-24 17:08:55 -07:00
Andy
2529d09b5a
[Frontend] Batch inference for llm.chat() API ( #8648 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-09-24 09:44:11 -07:00
Alex Brooks
8ff7ced996
[Model] Expose Phi3v num_crops as a mm_processor_kwarg ( #8658 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-24 07:36:46 +00:00
Simon Mo
3185fb0cca
Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" ( #8750 )
2024-09-24 05:45:20 +00:00