Burkhard Ringlein
|
f710090d8e
|
[Kernel] adding fused moe kernel config for L40S TP4 (#9245)
|
2024-10-11 08:54:22 -07:00 |
|
Tyler Michael Smith
|
7342a7d7f8
|
[Model] Support Mamba (#6484)
|
2024-10-11 15:40:06 +00:00 |
|
Cyrus Leung
|
e808156f30
|
[Misc] Collect model support info in a single process per model (#9233)
|
2024-10-11 11:08:11 +00:00 |
|
youkaichao
|
cbc2ef5529
|
[misc] hide best_of from engine (#9261)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
|
2024-10-10 21:30:44 -07:00 |
|
youkaichao
|
e00c094f15
|
[torch.compile] generic decorators (#9258)
|
2024-10-10 15:54:23 -07:00 |
|
youkaichao
|
e4d652ea3e
|
[torch.compile] integration with compilation control (#9058)
|
2024-10-10 12:39:36 -07:00 |
|
whyiug
|
04de9057ab
|
[Model] support input image embedding for minicpmv (#9237)
|
2024-10-10 15:00:47 +00:00 |
|
Isotr0py
|
07c11cf4d4
|
[Bugfix] Fix lm_head weights tying with lora for llama (#9227)
|
2024-10-10 21:11:56 +08:00 |
|
youkaichao
|
de895f1697
|
[misc] improve model support check in another process (#9208)
|
2024-10-09 21:58:27 -07:00 |
|
Li, Jiang
|
ca77dd7a44
|
[Hardware][CPU] Support AWQ for CPU backend (#7515)
|
2024-10-09 10:28:08 -06:00 |
|
Cyrus Leung
|
8bfaa4e31e
|
[Bugfix] fix composite weight loading and EAGLE weight loading (#9160)
|
2024-10-09 00:36:55 -07:00 |
|
Hui Liu
|
cdc72e3c80
|
[Model] Remap FP8 kv_scale in CommandR and DBRX (#9174)
|
2024-10-09 06:43:06 +00:00 |
|
chenqianfzh
|
2f4117c38e
|
support bitsandbytes quantization with more models (#9148)
|
2024-10-08 19:52:19 -06:00 |
|
Cyrus Leung
|
151ef4efd2
|
[Model] Support NVLM-D and fix QK Norm in InternViT (#9045)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2024-10-07 11:55:12 +00:00 |
|
Isotr0py
|
f19da64871
|
[Core] Refactor GGUF parameters packing and forwarding (#8859)
|
2024-10-07 10:01:46 +00:00 |
|
Cyrus Leung
|
8c6de96ea1
|
[Model] Explicit interface for vLLM models and support OOT embedding models (#9108)
|
2024-10-07 06:10:35 +00:00 |
|
youkaichao
|
18b296fdb2
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
Cyrus Leung
|
b22b798471
|
[Model] PP support for embedding models and update docs (#9090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-10-06 16:35:27 +08:00 |
|
Xin Yang
|
15986f598c
|
[Model] Support Gemma2 embedding model (#9004)
|
2024-10-05 06:57:05 +00:00 |
|
hhzhang16
|
53b3a33027
|
[Bugfix] Fixes Phi3v & Ultravox Multimodal EmbeddingInputs (#8979)
|
2024-10-04 22:05:37 -07:00 |
|
Chongming Ni
|
cc90419e89
|
[Hardware][Neuron] Add on-device sampling support for Neuron (#8746)
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
|
2024-10-04 16:42:20 -07:00 |
|
ElizaWszola
|
05d686432f
|
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
|
2024-10-04 12:34:44 -06:00 |
|
Roger Wang
|
26aa325f4f
|
[Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:38:25 -07:00 |
|
Prashant Gupta
|
9ade8bbc8d
|
[Model] add a bunch of supported lora modules for mixtral (#9008)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
|
2024-10-04 16:24:40 +00:00 |
|
whyiug
|
3d826d2c52
|
[Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL (#9071)
|
2024-10-04 14:34:58 +00:00 |
|
Cyrus Leung
|
0e36fd4909
|
[Misc] Move registry to its own file (#9064)
|
2024-10-04 10:01:37 +00:00 |
|
Murali Andoorveedu
|
0f6d7a9a34
|
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:56:58 +08:00 |
|
Domen Vreš
|
2838d6b38e
|
[Bugfix] Weight loading fix for OPT model (#9042)
Co-authored-by: dvres <dvres@fri.uni-lj.si>
|
2024-10-03 19:53:29 -04:00 |
|
Divakar Verma
|
01843c89b8
|
[Misc] log when using default MoE config (#8971)
|
2024-10-03 04:31:07 +00:00 |
|
Shawn Tan
|
19f0d25796
|
[Model] Adding Granite MoE. (#8206)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-03 09:33:57 +08:00 |
|
Sergey Shlyapnikov
|
f58d4fccc9
|
[OpenVINO] Enable GPU support for OpenVINO vLLM backend (#8192)
|
2024-10-02 17:50:01 -04:00 |
|
Lily Liu
|
1570203864
|
[Spec Decode] (1/2) Remove batch expansion (#8839)
|
2024-10-01 16:04:42 -07:00 |
|
Cyrus Leung
|
4f341bd4bf
|
[Doc] Update list of supported models (#8987)
|
2024-10-02 00:35:39 +08:00 |
|
Alex Brooks
|
1fe0a4264a
|
[Bugfix] Fix Token IDs Reference for MiniCPM-V When Images are Provided With No Placeholders (#8991)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-01 09:52:44 +00:00 |
|
Isotr0py
|
bc4eb65b54
|
[Bugfix] Fix Fuyu tensor parallel inference (#8986)
|
2024-10-01 17:51:41 +08:00 |
|
Divakar Verma
|
82f3937e59
|
[Misc] add process_weights_after_loading for DummyLoader (#8969)
|
2024-10-01 03:46:41 +00:00 |
|
Joe Runde
|
062c89e7c9
|
[Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-01 09:34:25 +08:00 |
|
Isotr0py
|
2ae25f79cf
|
[Model] Expose InternVL2 max_dynamic_patch as a mm_processor_kwarg (#8946)
|
2024-09-30 13:01:20 +08:00 |
|
Jee Jee Li
|
8e60afa15e
|
[Model][LoRA]LoRA support added for MiniCPMV2.6 (#8943)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-30 04:31:55 +00:00 |
|
whyiug
|
e01ab595d8
|
[Model] support input embeddings for qwen2vl (#8856)
|
2024-09-30 03:16:10 +00:00 |
|
Mor Zusman
|
f13a07b1f8
|
[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533)
|
2024-09-29 17:35:58 -04:00 |
|
Jee Jee Li
|
3d49776bbb
|
[Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199)
|
2024-09-29 06:59:45 +00:00 |
|
Zilin Zhu
|
bc2ef1f77c
|
[Model] Support Qwen2.5-Math-RM-72B (#8896)
|
2024-09-28 21:19:39 -07:00 |
|
ElizaWszola
|
d081da0064
|
[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-09-28 18:19:40 -07:00 |
|
Cyrus Leung
|
e1a3f5e831
|
[CI/Build] Update models tests & examples (#8874)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-28 09:54:35 -07:00 |
|
Lucas Wilkinson
|
c5d55356f9
|
[Bugfix] fix for deepseek w4a16 (#8906)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-09-27 13:12:34 -06:00 |
|
Luka Govedič
|
172d1cd276
|
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271)
|
2024-09-27 14:25:10 -04:00 |
|
Isotr0py
|
6d792d2f31
|
[Bugfix][VLM] Fix Fuyu batching inference with max_num_seqs>1 (#8892)
|
2024-09-27 01:15:58 -07:00 |
|
Roger Wang
|
4bb98f2190
|
[Misc] Update config loading for Qwen2-VL and remove Granite (#8837)
|
2024-09-26 07:45:30 -07:00 |
|
Michael Goin
|
7193774b1f
|
[Misc] Support quantization of MllamaForCausalLM (#8822)
|
2024-09-25 14:46:22 -07:00 |
|