Simon Mo
|
c76ac49d26
|
[Docs] Add Nebius as sponsors (#10371)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2024-11-15 12:47:40 -08:00 |
|
Simon Mo
|
a6221a144a
|
[Misc] bump mistral common version (#10367)
Signed-off-by: simon-mo <simon.mo@hey.com>
v0.6.4.post1
|
2024-11-15 09:48:07 -08:00 |
|
ElizaWszola
|
79ee45b428
|
[Misc] Bump up test_fused_moe tolerance (#10364)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-11-15 16:31:18 +00:00 |
|
Guillaume Calmettes
|
691a3ec047
|
[Bugfix] Ensure special tokens are properly filtered out for guided structured output with MistralTokenizer (#10363)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2024-11-15 14:50:40 +00:00 |
|
youkaichao
|
3a763ba0c3
|
[core][misc] keep compatibility for old-style classes (#10356)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-15 13:55:51 +00:00 |
|
shangmingc
|
f2056f726d
|
[Misc] Fix some help info of arg_utils to improve readability (#10362)
|
2024-11-15 12:40:30 +00:00 |
|
Jee Jee Li
|
1d65ec7eeb
|
[Bugfix] Fix fully sharded LoRA bug (#10352)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-15 10:34:58 +00:00 |
|
Xin Yang
|
26908554b2
|
[Doc] Remove float32 choice from --lora-dtype (#10348)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-11-15 10:22:57 +00:00 |
|
Cyrus Leung
|
b311efd0bd
|
[Misc] Fix import error in tensorizer tests and cleanup some code (#10349)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-15 09:34:17 +00:00 |
|
wchen61
|
3d158cdc8d
|
Add default value to avoid Falcon crash (#5363) (#10347)
Signed-off-by: wchen61 <wchen61@foxmail.com>
|
2024-11-15 08:52:20 +00:00 |
|
Simon Mo
|
02dbf30e9a
|
[Build] skip renaming files for release wheels pipeline (#9671)
Signed-off-by: simon-mo <simon.mo@hey.com>
v0.6.4
|
2024-11-14 23:31:52 -08:00 |
|
Cyrus Leung
|
2ac6d0e75b
|
[Misc] Consolidate pooler config overrides (#10351)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-15 06:59:00 +00:00 |
|
Sky Lee
|
2ec8827288
|
[Bugfix] Qwen-vl output is inconsistent in speculative decoding (#10350)
|
2024-11-15 05:40:10 +00:00 |
|
Cyrus Leung
|
b40cf6402e
|
[Model] Support Qwen2 embeddings and use tags to select model tests (#10184)
|
2024-11-14 20:23:09 -08:00 |
|
Tyler Michael Smith
|
2885ba0e24
|
[Misc] Change RedundantReshapesPass and FusionPass logging from info to debug (#10308)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-15 02:44:26 +00:00 |
|
Luka Govedič
|
bf2ddc6610
|
[bugfix] Fix static asymmetric quantization case (#10334)
Signed-off-by: Daniël de Kok <me@danieldk.eu>
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
|
2024-11-15 09:35:11 +08:00 |
|
Cyrus Leung
|
972112d82f
|
[Bugfix] Fix unable to load some models (#10312)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-14 16:55:54 -08:00 |
|
Patrick von Platen
|
11cd1ae6ad
|
[Tool parsing] Improve / correct mistral tool parsing (#10333)
|
2024-11-15 00:42:49 +00:00 |
|
Zijin Xiao
|
554af9228d
|
[Bugfix] use AF_INET6 for OpenAI Compatible Server with ipv6 (#9583)
Signed-off-by: xiaozijin <xiaozijin@bytedance.com>
|
2024-11-14 16:38:53 -08:00 |
|
Murali Andoorveedu
|
b2e0ad3b59
|
[Perf] Reduce peak memory usage of llama (#10339)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
|
2024-11-15 00:38:20 +00:00 |
|
Maximilien de Bayser
|
4a18fd14ba
|
Support Roberta embedding models (#9387)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-11-14 21:23:29 +00:00 |
|
Woosuk Kwon
|
1dbae0329c
|
[Docs] Publish meetup slides (#10331)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-14 16:19:38 +00:00 |
|
Cyrus Leung
|
675d603400
|
[CI/Build] Make shellcheck happy (#10285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-14 09:47:53 +00:00 |
|
Isotr0py
|
03025c023f
|
[CI/Build] Fix CPU CI online inference timeout (#10314)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-14 16:45:32 +08:00 |
|
youkaichao
|
29f3ef26a3
|
[ci][distributed] disable hanging tests (#10317)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-14 00:23:39 -08:00 |
|
B-201
|
294bf467ba
|
[Model] Add BNB quantization support for Idefics3 (#10310)
Signed-off-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-14 06:31:44 +00:00 |
|
Guillaume Calmettes
|
52b48c1ead
|
[BugFix]: properly deserialize tool_calls iterator before processing by mistral-common when MistralTokenizer is used (#9951)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2024-11-14 04:48:16 +00:00 |
|
Mike Depinet
|
f67ce05d0b
|
[Frontend] Pythonic tool parser (#9859)
Signed-off-by: Mike Depinet <mike@fixie.ai>
|
2024-11-14 04:14:34 +00:00 |
|
Russell Bryant
|
e0853b6508
|
[Misc] format.sh: Simplify tool_version_check (#10305)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-14 11:12:35 +08:00 |
|
youkaichao
|
504ac53d18
|
[misc] error early for old-style class (#10304)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-13 18:55:39 -08:00 |
|
Isotr0py
|
15bb8330aa
|
[Bugfix] Fix tensor parallel for qwen2 classification model (#10297)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-14 10:54:59 +08:00 |
|
HoangCongDuc
|
ac49b59d8b
|
[Bugfix] bitsandbytes models fail to run pipeline parallel (#10200)
Signed-off-by: Hoang Cong Duc <hoangcongducltt@gmail.com>
|
2024-11-13 09:56:39 -07:00 |
|
Cyrus Leung
|
0b8bb86bf1
|
[1/N] Initial prototype for multi-modal processor (#10044)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-13 12:39:03 +00:00 |
|
Roger Wang
|
bb7991aa29
|
[V1] Add missing tokenizer options for Detokenizer (#10288)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-13 11:02:56 +00:00 |
|
B-201
|
d909acf9fe
|
[Model][LoRA]LoRA support added for idefics3 (#10281)
Signed-off-by: B-201 <Joy25810@foxmail.com>
|
2024-11-13 17:25:59 +08:00 |
|
Pavani Majety
|
b6dde33019
|
[Core] Flashinfer - Remove advance step size restriction (#10282)
|
2024-11-13 16:29:32 +08:00 |
|
Austin Veselka
|
1b886aa104
|
[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944)
Signed-off-by: FurtherAI <austin.veselka@lighton.ai>
Co-authored-by: FurtherAI <austin.veselka@lighton.ai>
|
2024-11-13 08:28:13 +00:00 |
|
电脑星人
|
3945c82346
|
[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221)
Signed-off-by: imkero <kerorek@outlook.com>
|
2024-11-13 07:07:22 +00:00 |
|
Xin Yang
|
032fcf16ae
|
[Doc] Fix typo in arg_utils.py (#10264)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-11-12 21:54:52 -08:00 |
|
Dipika Sikka
|
56a955e774
|
Bump to compressed-tensors v0.8.0 (#10279)
Signed-off-by: Dipika <dipikasikka1@gmail.com>
|
2024-11-12 21:54:10 -08:00 |
|
Woosuk Kwon
|
bbd3e86926
|
[V1] Support VLMs with fine-grained scheduling (#9871)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-11-13 04:53:13 +00:00 |
|
youkaichao
|
0d4ea3fb5c
|
[core][distributed] use tcp store directly (#10275)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-12 17:36:08 -08:00 |
|
Woosuk Kwon
|
112fa0bbe5
|
[V1] Fix CI tests on V1 engine (#10272)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-12 16:17:20 -08:00 |
|
youkaichao
|
377b74fe87
|
Revert "[ci][build] limit cmake version" (#10271)
|
2024-11-12 15:06:48 -08:00 |
|
youkaichao
|
18081451f9
|
[doc] improve debugging doc (#10270)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-12 14:43:52 -08:00 |
|
youkaichao
|
96ae0eaeb2
|
[doc] fix location of runllm widget (#10266)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-12 14:34:39 -08:00 |
|
Woosuk Kwon
|
1f55e05713
|
[V1] Enable Inductor when using piecewise CUDA graphs (#10268)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-12 13:39:56 -08:00 |
|
Umesh
|
8a06428c70
|
[LoRA] Adds support for bias in LoRA (#5733)
Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com>
Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>
|
2024-11-12 11:08:40 -08:00 |
|
sroy745
|
b41fb9d3b1
|
[Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers (#9982)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
|
2024-11-12 10:53:57 -08:00 |
|
Woosuk Kwon
|
7c65527918
|
[V1] Use pickle for serializing EngineCoreRequest & Add multimodal inputs to EngineCoreRequest (#10245)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-12 08:57:14 -08:00 |
|