Varad Ahirwadkar
|
e5dc713c23
|
[Hardware][PowerPC] Make oneDNN dependency optional for Power (#9039)
Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com>
|
2024-10-04 17:24:42 +00:00 |
|
Simon Mo
|
36eecfbddb
|
Remove AMD Ray Summit Banner (#9075)
|
2024-10-04 10:17:16 -07:00 |
|
Prashant Gupta
|
9ade8bbc8d
|
[Model] add a bunch of supported lora modules for mixtral (#9008)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
|
2024-10-04 16:24:40 +00:00 |
|
Lucas Wilkinson
|
22482e495e
|
[Bugfix] Flash attention arches not getting set properly (#9062)
|
2024-10-04 09:43:15 -06:00 |
|
whyiug
|
3d826d2c52
|
[Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL (#9071)
|
2024-10-04 14:34:58 +00:00 |
|
Cyrus Leung
|
0e36fd4909
|
[Misc] Move registry to its own file (#9064)
|
2024-10-04 10:01:37 +00:00 |
|
Murali Andoorveedu
|
0f6d7a9a34
|
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:56:58 +08:00 |
|
Michael Goin
|
303d44790a
|
[Misc] Enable multi-step output streaming by default (#9047)
|
2024-10-03 22:55:42 -04:00 |
|
Lucas Wilkinson
|
aeb37c2a72
|
[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845)
|
2024-10-03 22:55:25 -04:00 |
|
代君
|
3dbb215b38
|
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405)
|
2024-10-04 10:36:39 +08:00 |
|
Domen Vreš
|
2838d6b38e
|
[Bugfix] Weight loading fix for OPT model (#9042)
Co-authored-by: dvres <dvres@fri.uni-lj.si>
|
2024-10-03 19:53:29 -04:00 |
|
sroy745
|
91add85ec4
|
Fix failing spec decode test (#9054)
|
2024-10-03 23:07:29 +00:00 |
|
youkaichao
|
9aaf14c62e
|
[misc] add forward context for attention (#9029)
|
2024-10-03 12:09:42 -07:00 |
|
xendo
|
63e39937f9
|
[Frontend] [Neuron] Parse literals out of override-neuron-config (#8959)
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
|
2024-10-03 18:02:07 +00:00 |
|
sroy745
|
f5d72b2fc6
|
[Core] Make BlockSpaceManagerV2 the default BlockManager to use. (#8678)
|
2024-10-03 09:44:21 -07:00 |
|
Guillaume Calmettes
|
83caf35e08
|
[BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser (#9020)
|
2024-10-03 16:44:52 +08:00 |
|
Divakar Verma
|
01843c89b8
|
[Misc] log when using default MoE config (#8971)
|
2024-10-03 04:31:07 +00:00 |
|
Travis Johnson
|
19a4dd0990
|
[Bugfix] example template should not add parallel_tool_prompt if tools is none (#9007)
|
2024-10-03 03:04:17 +00:00 |
|
Nick Hill
|
18c2e30c57
|
[Doc] Update Granite model docs (#9025)
|
2024-10-03 02:42:24 +00:00 |
|
Shawn Tan
|
19f0d25796
|
[Model] Adding Granite MoE. (#8206)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-03 09:33:57 +08:00 |
|
Sergey Shlyapnikov
|
f58d4fccc9
|
[OpenVINO] Enable GPU support for OpenVINO vLLM backend (#8192)
|
2024-10-02 17:50:01 -04:00 |
|
Varun Sundar Rabindranath
|
afb050b29d
|
[Core] CUDA Graphs for Multi-Step + Chunked-Prefill (#8645)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-10-02 19:44:39 +00:00 |
|
Alex Brooks
|
7f60520deb
|
[Misc] Update Default Image Mapper Error Log (#8977)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-10-02 11:44:38 +00:00 |
|
afeldman-nm
|
563649aafe
|
[Core] Combined support for multi-step scheduling, chunked prefill & prefix caching (#8804)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
|
2024-10-02 07:52:20 +00:00 |
|
Lily Liu
|
1570203864
|
[Spec Decode] (1/2) Remove batch expansion (#8839)
|
2024-10-01 16:04:42 -07:00 |
|
vlsav
|
22f5851b80
|
Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (#8997)
|
2024-10-01 11:07:06 -07:00 |
|
Cyrus Leung
|
4f341bd4bf
|
[Doc] Update list of supported models (#8987)
|
2024-10-02 00:35:39 +08:00 |
|
Sebastian Schoennenbeck
|
35bd215168
|
[Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (#8965)
|
2024-10-01 09:58:06 +00:00 |
|
Alex Brooks
|
1fe0a4264a
|
[Bugfix] Fix Token IDs Reference for MiniCPM-V When Images are Provided With No Placeholders (#8991)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-01 09:52:44 +00:00 |
|
Isotr0py
|
bc4eb65b54
|
[Bugfix] Fix Fuyu tensor parallel inference (#8986)
|
2024-10-01 17:51:41 +08:00 |
|
Divakar Verma
|
82f3937e59
|
[Misc] add process_weights_after_loading for DummyLoader (#8969)
|
2024-10-01 03:46:41 +00:00 |
|
youkaichao
|
7da2487591
|
[torch.compile] fix tensor alias (#8982)
|
2024-10-01 03:40:48 +00:00 |
|
Kevin H. Luu
|
aaccca2b4d
|
[CI/Build] Fix machete generated kernel files ordering (#8976)
Signed-off-by: kevin <kevin@anyscale.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-10-01 03:33:12 +00:00 |
|
Joe Runde
|
062c89e7c9
|
[Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-01 09:34:25 +08:00 |
|
Lily Liu
|
bce324487a
|
[CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975)
|
2024-10-01 00:51:40 +00:00 |
|
Kevin H. Luu
|
1425a1bcf9
|
[ci] Add CODEOWNERS for test directories (#8795)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-10-01 00:47:08 +00:00 |
|
Jee Jee Li
|
1cabfcefb6
|
[Misc] Adjust max_position_embeddings for LoRA compatibility (#8957)
|
2024-09-30 12:57:39 +00:00 |
|
Sebastian Schoennenbeck
|
be76e5aabf
|
[Core] Make scheduling policy settable via EngineArgs (#8956)
|
2024-09-30 12:28:44 +00:00 |
|
Isotr0py
|
2ae25f79cf
|
[Model] Expose InternVL2 max_dynamic_patch as a mm_processor_kwarg (#8946)
|
2024-09-30 13:01:20 +08:00 |
|
Jee Jee Li
|
8e60afa15e
|
[Model][LoRA]LoRA support added for MiniCPMV2.6 (#8943)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-30 04:31:55 +00:00 |
|
Roger Wang
|
b6d7392579
|
[Misc][CI/Build] Include cv2 via mistral_common[opencv] (#8951)
|
2024-09-30 04:28:26 +00:00 |
|
whyiug
|
e01ab595d8
|
[Model] support input embeddings for qwen2vl (#8856)
|
2024-09-30 03:16:10 +00:00 |
|
Mor Zusman
|
f13a07b1f8
|
[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533)
|
2024-09-29 17:35:58 -04:00 |
|
danieljannai21
|
6c9ba48fde
|
[Frontend] Added support for HF's new continue_final_message parameter (#8942)
|
2024-09-29 17:59:47 +00:00 |
|
juncheoll
|
1fb9c1b0bf
|
[Misc] Fix typo in BlockSpaceManagerV1 (#8944)
|
2024-09-29 15:05:54 +00:00 |
|
Nick Hill
|
31f46a0d35
|
[BugFix] Fix seeded random sampling with encoder-decoder models (#8870)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-29 09:43:14 +00:00 |
|
Jee Jee Li
|
3d49776bbb
|
[Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199)
|
2024-09-29 06:59:45 +00:00 |
|
Zilin Zhu
|
bc2ef1f77c
|
[Model] Support Qwen2.5-Math-RM-72B (#8896)
|
2024-09-28 21:19:39 -07:00 |
|
Tyler Michael Smith
|
2e7fe7e79f
|
[Build/CI] Set FETCHCONTENT_BASE_DIR to one location for better caching (#8930)
|
2024-09-29 03:13:01 +00:00 |
|
Cyrus Leung
|
26a68d5d7e
|
[CI/Build] Add test decorator for minimum GPU memory (#8925)
|
2024-09-29 02:50:51 +00:00 |
|