Gregory Shtrasberg
|
e97f802b2d
|
[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2025-01-23 18:04:03 +00:00 |
|
Jee Jee Li
|
84bee4bd5c
|
[Misc] Improve the readability of BNB error messages (#12320)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-22 16:56:54 +00:00 |
|
Cyrus Leung
|
59a0192fb9
|
[Core] Interface for accessing model from VllmRunner (#10353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-20 15:00:59 +08:00 |
|
Martin Gleize
|
bbe5f9de7d
|
[Model] Support for fairseq2 Llama (#11442)
Signed-off-by: Martin Gleize <mgleize@meta.com>
Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas>
|
2025-01-19 10:40:40 -08:00 |
|
Isotr0py
|
edaae198e7
|
[Misc] Add BNB support to GLM4-V model (#12184)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-19 19:49:22 +08:00 |
|
Jee Jee Li
|
a3a3ee4e6f
|
[Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping (#11924)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-15 07:49:49 +08:00 |
|
youkaichao
|
d53575a5f0
|
[ci] fix gh200 tests (#11919)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-10 16:25:17 +08:00 |
|
Cyrus Leung
|
d848800e88
|
[Misc] Move print_*_once from utils to logger (#11298)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
|
2025-01-09 12:48:12 +08:00 |
|
Harry Mellor
|
aba8d6ee00
|
[Doc] Move examples into categories (#11840)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-08 13:09:53 +00:00 |
|
Isotr0py
|
dde1fa18c9
|
[Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (#11566)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-27 19:45:13 +00:00 |
|
Jee Jee Li
|
0240402c46
|
[Misc]Add BNB quantization for MolmoForCausalLM (#11551)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-27 18:48:24 +00:00 |
|
Cyrus Leung
|
eec906d811
|
[Misc] Add placeholder module (#11501)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 13:12:51 +00:00 |
|
Cyrus Leung
|
3f3e92e1f2
|
[Model] Automatic conversion of classification and reward models (#11469)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 18:22:22 +00:00 |
|
omer-dayan
|
995f56236b
|
[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
|
2024-12-20 16:46:24 +00:00 |
|
Cyrus Leung
|
8f10d5e393
|
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 01:28:00 -08:00 |
|
Cyrus Leung
|
bf0e382e16
|
[Model] Composite weight loading for multimodal Qwen2 (#10944)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 07:22:52 -07:00 |
|
Jee Jee Li
|
1f958a7d52
|
[Bugfix] Fix BNB loader target_modules (#10720)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-05 13:20:26 +08:00 |
|
Isotr0py
|
4c05edb33a
|
[Model] Add TP and BNB quantization support to LlavaMultiModalProjector (#10834)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-02 23:06:09 +00:00 |
|
Cyrus Leung
|
133707123e
|
[Model] Replace embedding models with pooling adapter (#10769)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 08:02:54 +08:00 |
|
Jee Jee Li
|
15cc2a9f1a
|
[Misc]Further reduce BNB static variable (#10597)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-26 22:54:12 -08:00 |
|
youkaichao
|
05d1f8c9c6
|
[misc] move functions to config.py (#10624)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 09:27:30 +00:00 |
|
Jee Jee Li
|
17d8fc1806
|
[bugfix] Fix example/tensorize_vllm_model tests (#10595)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-23 17:22:33 -08:00 |
|
Isotr0py
|
b6374e09b0
|
[Bugfix] Fix Phi-3 BNB quantization with tensor parallel (#9948)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-22 15:01:56 +08:00 |
|
Russell Bryant
|
fd9f124971
|
[Doc] fix link for page that was renamed (#10455)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-19 09:48:30 -08:00 |
|
Yan Ma
|
6b2d25efc7
|
[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2024-11-18 11:18:05 -07:00 |
|
Isotr0py
|
c4e464333e
|
[Misc] Add uninitialized params tracking for AutoWeightsLoader (#10327)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-18 09:07:46 +08:00 |
|
youkaichao
|
4fd9375028
|
[2/N][torch.compile] make compilation cfg part of vllm cfg (#10383)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-16 18:02:14 -08:00 |
|
youkaichao
|
3a763ba0c3
|
[core][misc] keep compatibility for old-style classes (#10356)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-15 13:55:51 +00:00 |
|
youkaichao
|
504ac53d18
|
[misc] error early for old-style class (#10304)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-13 18:55:39 -08:00 |
|
HoangCongDuc
|
ac49b59d8b
|
[Bugfix] bitsandbytes models fail to run pipeline parallel (#10200)
Signed-off-by: Hoang Cong Duc <hoangcongducltt@gmail.com>
|
2024-11-13 09:56:39 -07:00 |
|
youkaichao
|
1a95f10ee7
|
[5/N] pass the whole config to model (#9983)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-09 14:17:28 +08:00 |
|
Aaron Pham
|
21063c11c7
|
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-11-06 07:11:55 +00:00 |
|
Jee Jee Li
|
b9c64c0ca7
|
[Misc] Modify BNB parameter name (#9997)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-05 14:40:08 -05:00 |
|
Jee Jee Li
|
fb2716d641
|
[Misc]Reduce BNB static variable (#9987)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-04 17:04:40 +00:00 |
|
youkaichao
|
8d72bb20fa
|
[4/N] make quant config first-class citizen (#9978)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-04 08:51:31 -08:00 |
|
Jee Jee Li
|
c49f0407ba
|
[Bugfix] Fix MiniCPMV and Mllama BNB bug (#9917)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-04 03:36:41 +00:00 |
|
youkaichao
|
3bb4befea7
|
[bugfix] fix tsts (#9959)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-02 15:54:05 -07:00 |
|
youkaichao
|
cea808f325
|
[3/N] model runner pass the whole config to model (#9958)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-02 12:08:49 -07:00 |
|
Went-Liang
|
81f09cfd80
|
[Model] Support math-shepherd-mistral-7b-prm model (#9697)
Signed-off-by: Went-Liang <wenteng_liang@163.com>
|
2024-10-30 09:33:42 -07:00 |
|
Michael Goin
|
bc73e9821c
|
[Bugfix] Fix prefix strings for quantized VLMs (#9772)
|
2024-10-29 16:02:59 -07:00 |
|
yannicks1
|
0ce7798f44
|
[Misc]: Typo fix: Renaming classes (casualLM -> causalLM) (#9801)
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
|
2024-10-29 10:39:20 -07:00 |
|
Isotr0py
|
09500f7dde
|
[Model] Add BNB quantization support for Mllama (#9720)
|
2024-10-29 08:20:02 -04:00 |
|
Mengqing Cao
|
5cbdccd151
|
[Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716)
|
2024-10-26 10:59:06 +00:00 |
|
Shashwat Srijan
|
bb76538bbd
|
[Hardwware][Neuron] Simplify model load for transformers-neuronx library (#9380)
|
2024-10-17 15:39:39 -07:00 |
|
Cyrus Leung
|
390be74649
|
[Misc] Print stack trace using logger.exception (#9461)
|
2024-10-17 13:55:48 +00:00 |
|
Tyler Michael Smith
|
7342a7d7f8
|
[Model] Support Mamba (#6484)
|
2024-10-11 15:40:06 +00:00 |
|
chenqianfzh
|
2f4117c38e
|
support bitsandbytes quantization with more models (#9148)
|
2024-10-08 19:52:19 -06:00 |
|
Chongming Ni
|
cc90419e89
|
[Hardware][Neuron] Add on-device sampling support for Neuron (#8746)
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
|
2024-10-04 16:42:20 -07:00 |
|
ElizaWszola
|
05d686432f
|
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
|
2024-10-04 12:34:44 -06:00 |
|
Cyrus Leung
|
0e36fd4909
|
[Misc] Move registry to its own file (#9064)
|
2024-10-04 10:01:37 +00:00 |
|