3440 Commits

Author SHA1 Message Date
Simon Mo
c76ac49d26
[Docs] Add Nebius as sponsors (#10371)
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-11-15 12:47:40 -08:00
Simon Mo
a6221a144a
[Misc] bump mistral common version (#10367)
Signed-off-by: simon-mo <simon.mo@hey.com>
v0.6.4.post1
2024-11-15 09:48:07 -08:00
ElizaWszola
79ee45b428
[Misc] Bump up test_fused_moe tolerance (#10364)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
2024-11-15 16:31:18 +00:00
Guillaume Calmettes
691a3ec047
[Bugfix] Ensure special tokens are properly filtered out for guided structured output with MistralTokenizer (#10363)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2024-11-15 14:50:40 +00:00
youkaichao
3a763ba0c3
[core][misc] keep compatibility for old-style classes (#10356)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-15 13:55:51 +00:00
shangmingc
f2056f726d
[Misc] Fix some help info of arg_utils to improve readability (#10362) 2024-11-15 12:40:30 +00:00
Jee Jee Li
1d65ec7eeb
[Bugfix] Fix fully sharded LoRA bug (#10352)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-15 10:34:58 +00:00
Xin Yang
26908554b2
[Doc] Remove float32 choice from --lora-dtype (#10348)
Signed-off-by: Xin Yang <xyang19@gmail.com>
2024-11-15 10:22:57 +00:00
Cyrus Leung
b311efd0bd
[Misc] Fix import error in tensorizer tests and cleanup some code (#10349)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-15 09:34:17 +00:00
wchen61
3d158cdc8d
Add default value to avoid Falcon crash (#5363) (#10347)
Signed-off-by: wchen61 <wchen61@foxmail.com>
2024-11-15 08:52:20 +00:00
Simon Mo
02dbf30e9a
[Build] skip renaming files for release wheels pipeline (#9671)
Signed-off-by: simon-mo <simon.mo@hey.com>
v0.6.4
2024-11-14 23:31:52 -08:00
Cyrus Leung
2ac6d0e75b
[Misc] Consolidate pooler config overrides (#10351)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-15 06:59:00 +00:00
Sky Lee
2ec8827288
[Bugfix] Qwen-vl output is inconsistent in speculative decoding (#10350) 2024-11-15 05:40:10 +00:00
Cyrus Leung
b40cf6402e
[Model] Support Qwen2 embeddings and use tags to select model tests (#10184) 2024-11-14 20:23:09 -08:00
Tyler Michael Smith
2885ba0e24
[Misc] Change RedundantReshapesPass and FusionPass logging from info to debug (#10308)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-11-15 02:44:26 +00:00
Luka Govedič
bf2ddc6610
[bugfix] Fix static asymmetric quantization case (#10334)
Signed-off-by: Daniël de Kok <me@danieldk.eu>
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-11-15 09:35:11 +08:00
Cyrus Leung
972112d82f
[Bugfix] Fix unable to load some models (#10312)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-14 16:55:54 -08:00
Patrick von Platen
11cd1ae6ad
[Tool parsing] Improve / correct mistral tool parsing (#10333) 2024-11-15 00:42:49 +00:00
Zijin Xiao
554af9228d
[Bugfix] use AF_INET6 for OpenAI Compatible Server with ipv6 (#9583)
Signed-off-by: xiaozijin <xiaozijin@bytedance.com>
2024-11-14 16:38:53 -08:00
Murali Andoorveedu
b2e0ad3b59
[Perf] Reduce peak memory usage of llama (#10339)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
2024-11-15 00:38:20 +00:00
Maximilien de Bayser
4a18fd14ba
Support Roberta embedding models (#9387)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
2024-11-14 21:23:29 +00:00
Woosuk Kwon
1dbae0329c
[Docs] Publish meetup slides (#10331)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-14 16:19:38 +00:00
Cyrus Leung
675d603400
[CI/Build] Make shellcheck happy (#10285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-14 09:47:53 +00:00
Isotr0py
03025c023f
[CI/Build] Fix CPU CI online inference timeout (#10314)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-14 16:45:32 +08:00
youkaichao
29f3ef26a3
[ci][distributed] disable hanging tests (#10317)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-14 00:23:39 -08:00
B-201
294bf467ba
[Model] Add BNB quantization support for Idefics3 (#10310)
Signed-off-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-14 06:31:44 +00:00
Guillaume Calmettes
52b48c1ead
[BugFix]: properly deserialize tool_calls iterator before processing by mistral-common when MistralTokenizer is used (#9951)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2024-11-14 04:48:16 +00:00
Mike Depinet
f67ce05d0b
[Frontend] Pythonic tool parser (#9859)
Signed-off-by: Mike Depinet <mike@fixie.ai>
2024-11-14 04:14:34 +00:00
Russell Bryant
e0853b6508
[Misc] format.sh: Simplify tool_version_check (#10305)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-14 11:12:35 +08:00
youkaichao
504ac53d18
[misc] error early for old-style class (#10304)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-13 18:55:39 -08:00
Isotr0py
15bb8330aa
[Bugfix] Fix tensor parallel for qwen2 classification model (#10297)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-14 10:54:59 +08:00
HoangCongDuc
ac49b59d8b
[Bugfix] bitsandbytes models fail to run pipeline parallel (#10200)
Signed-off-by: Hoang Cong Duc <hoangcongducltt@gmail.com>
2024-11-13 09:56:39 -07:00
Cyrus Leung
0b8bb86bf1
[1/N] Initial prototype for multi-modal processor (#10044)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-13 12:39:03 +00:00
Roger Wang
bb7991aa29
[V1] Add missing tokenizer options for Detokenizer (#10288)
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-11-13 11:02:56 +00:00
B-201
d909acf9fe
[Model][LoRA]LoRA support added for idefics3 (#10281)
Signed-off-by: B-201 <Joy25810@foxmail.com>
2024-11-13 17:25:59 +08:00
Pavani Majety
b6dde33019
[Core] Flashinfer - Remove advance step size restriction (#10282) 2024-11-13 16:29:32 +08:00
Austin Veselka
1b886aa104
[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944)
Signed-off-by: FurtherAI <austin.veselka@lighton.ai>
Co-authored-by: FurtherAI <austin.veselka@lighton.ai>
2024-11-13 08:28:13 +00:00
电脑星人
3945c82346
[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221)
Signed-off-by: imkero <kerorek@outlook.com>
2024-11-13 07:07:22 +00:00
Xin Yang
032fcf16ae
[Doc] Fix typo in arg_utils.py (#10264)
Signed-off-by: Xin Yang <xyang19@gmail.com>
2024-11-12 21:54:52 -08:00
Dipika Sikka
56a955e774
Bump to compressed-tensors v0.8.0 (#10279)
Signed-off-by: Dipika <dipikasikka1@gmail.com>
2024-11-12 21:54:10 -08:00
Woosuk Kwon
bbd3e86926
[V1] Support VLMs with fine-grained scheduling (#9871)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-11-13 04:53:13 +00:00
youkaichao
0d4ea3fb5c
[core][distributed] use tcp store directly (#10275)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-12 17:36:08 -08:00
Woosuk Kwon
112fa0bbe5
[V1] Fix CI tests on V1 engine (#10272)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-12 16:17:20 -08:00
youkaichao
377b74fe87
Revert "[ci][build] limit cmake version" (#10271) 2024-11-12 15:06:48 -08:00
youkaichao
18081451f9
[doc] improve debugging doc (#10270)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-12 14:43:52 -08:00
youkaichao
96ae0eaeb2
[doc] fix location of runllm widget (#10266)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-12 14:34:39 -08:00
Woosuk Kwon
1f55e05713
[V1] Enable Inductor when using piecewise CUDA graphs (#10268)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-12 13:39:56 -08:00
Umesh
8a06428c70
[LoRA] Adds support for bias in LoRA (#5733)
Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com>
Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>
2024-11-12 11:08:40 -08:00
sroy745
b41fb9d3b1
[Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers (#9982)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
2024-11-12 10:53:57 -08:00
Woosuk Kwon
7c65527918
[V1] Use pickle for serializing EngineCoreRequest & Add multimodal inputs to EngineCoreRequest (#10245)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-12 08:57:14 -08:00