3323 Commits

Author SHA1 Message Date
Russell Bryant
3a7f15a398
[Doc] Move CONTRIBUTING to docs site (#9924)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-08 05:15:12 +00:00
Mengqing Cao
7371749d54
[Misc] Fix ImportError causing by triton (#9493) 2024-11-08 05:08:51 +00:00
DearPlanet
ad39bd640c
[Bugfix] Add error handling when server cannot respond any valid tokens (#5895) 2024-11-08 04:58:37 +00:00
whyiug
40d0e7411d
[Doc] Update FAQ links in spec_decode.rst (#9662)
Signed-off-by: whyiug <whyiug@hotmail.com>
2024-11-08 04:44:58 +00:00
Russell Bryant
6bb52b0f97
[CI/Build] Give PR cleanup job PR write access (#10139)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-08 12:10:20 +08:00
Cody Yu
201fc07730
[V1] Prefix caching (take 2) (#9972)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2024-11-07 17:34:44 -08:00
Woosuk Kwon
42b4f46b71
[V1] Add all_token_ids attribute to Request (#10135)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-07 17:08:24 -08:00
Jiangtao Hu
073a472728
[Misc] report relevant env vars in collect_env.py tool (#9293) 2024-11-07 16:14:01 -08:00
dependabot[bot]
93bff421bc
Bump actions/checkout from 4.2.1 to 4.2.2 (#9746)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-07 21:44:58 +00:00
litianjian
28b2877d30
Online video support for VLMs (#10020)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-07 20:25:59 +00:00
dependabot[bot]
97b8475beb
Bump actions/setup-python from 5.2.0 to 5.3.0 (#9745)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-07 18:55:35 +00:00
Russell Bryant
a2f1f3b089
[CI/Build] Automate PR body text cleanup (#10082)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 18:26:28 +00:00
Russell Bryant
3be5b26a76
[CI/Build] Add shell script linting using shellcheck (#7925)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 18:17:29 +00:00
Russell Bryant
de0e61a323
[CI/Build] Always run mypy (#10122)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 16:43:16 +00:00
Nicolò Lucchesi
9d43afcc53
[Feature] [Spec decode]: Combine chunked prefill with speculative decoding (#9291)
Signed-off-by: NickLucche <nlucches@redhat.com>
2024-11-07 08:15:14 -08:00
Maximilien de Bayser
ae62fd17c0
[Frontend] Tool calling parser for Granite 3.0 models (#9027)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-11-07 07:09:02 -08:00
Atlas
a62bc0109c
[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. (#10105)
Signed-off-by: Mozhou <spli161006@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-11-07 11:20:30 +00:00
Jiahao Li
999df95b4e
[Bugfix] Make image processor respect mm_processor_kwargs for Qwen2-VL (#10112)
Signed-off-by: Jiahao Li <liplus17@163.com>
2024-11-07 10:50:44 +00:00
Li, Jiang
a6f332d0d9
[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target (#10108)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-11-07 18:42:50 +08:00
Lei Yang
0dfba97b42
[Frontend] Fix multiple values for keyword argument error (#10075) (#10076)
Signed-off-by: Lei <ylxx@live.com>
2024-11-07 09:07:19 +00:00
Flávia Béo
aa9078fa03
Adds method to read the pooling types from model's files (#9506)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2024-11-07 08:42:40 +00:00
Russell Bryant
e036e527a0
[CI/Build] Improve mypy + python version matrix (#10041)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 07:54:16 +00:00
Hanzhi Zhou
6192e9b8fe
[Core][Distributed] Refactor ipc buffer init in CustomAllreduce (#10030)
Signed-off-by: Hanzhi Zhou <hanzhi713@gmail.com>
2024-11-06 23:50:47 -08:00
Rafael Vasquez
d7263a1bb8
Doc: Improve benchmark documentation (#9927)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-11-06 23:50:35 -08:00
Russell Bryant
104d729656
[CI/Build] re-add codespell to CI (#10083)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-06 22:54:46 -08:00
Cyrus Leung
db7db4aab9
[Misc] Consolidate ModelConfig code related to HF config (#10104)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-07 06:00:21 +00:00
Nick Hill
1fa020c539
[V1][BugFix] Fix Generator construction in greedy + seed case (#10097)
Signed-off-by: Nick Hill <nhill@redhat.com>
2024-11-07 05:06:57 +00:00
youkaichao
e7b84c394d
[doc] add back Python 3.8 ABI (#10100)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-06 21:06:41 -08:00
Li, Jiang
a4b3e0c1e9
[Hardware][CPU] Update torch 2.5 (#9911)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-11-07 04:43:08 +00:00
Nick Hill
29862b884b
[Frontend] Adjust try/except blocks in API impl (#10056)
Signed-off-by: Nick Hill <nhill@redhat.com>
2024-11-06 20:07:51 -08:00
Yan Ma
d3859f1891
[Misc][XPU] Upgrade to Pytorch 2.5 for xpu backend (#9823)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2024-11-06 17:29:03 -08:00
Michael Goin
4ab3256644
[Bugfix] Fix FP8 torch._scaled_mm fallback for torch>2.5 with CUDA<12.4 (#10095)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-07 00:54:13 +00:00
youkaichao
719c1ca468
[core][distributed] add stateless_init_process_group (#10072)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-06 16:42:09 -08:00
Russell Bryant
74f2f8a0f1
[CI/Build] Always run the ruff workflow (#10092)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-06 22:25:23 +00:00
Joe Runde
d58268c56a
[V1] Make v1 more testable (#9888)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-11-06 11:57:35 -08:00
Russell Bryant
87bd7e0515
[CI/Build] change conflict PR comment from mergify (#10080)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-06 10:15:42 -08:00
Russell Bryant
098f94de42
[CI/Build] Drop Python 3.8 support (#10038)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-06 14:31:01 +00:00
Michael Goin
399c798608
Remove ScaledActivation for AWQ (#10057)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-06 14:27:06 +00:00
Eric
406d4cc480
[Model][LoRA]LoRA support added for Qwen2VLForConditionalGeneration (#10022)
Signed-off-by: ericperfect <ericperfectttt@gmail.com>
2024-11-06 14:13:15 +00:00
Jee Jee Li
a5bba7d234
[Model] Add Idefics3 support (#9767)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
2024-11-06 11:41:17 +00:00
Jee Jee Li
2003cc3513
[Model][LoRA]LoRA support added for LlamaEmbeddingModel (#10071)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-06 09:49:19 +00:00
Woosuk Kwon
6a585a23d2
[Hotfix] Fix ruff errors (#10073)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-06 01:24:28 -08:00
Konrad Zawora
a02a50e6e5
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
2024-11-06 01:09:10 -08:00
Isotr0py
a5fda50a10
[CI/Build] Fix large_gpu_mark reason (#10070)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-06 08:50:37 +00:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
youkaichao
4be3a45158
[distributed] add function to create ipc buffers directly (#10064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-05 22:35:03 -08:00
Woosuk Kwon
4089985552
[V1] Integrate Piecewise CUDA graphs (#10058)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-05 22:16:04 -08:00
zifeitong
9d59b75593
[Bugfix] Remove CustomChatCompletionContentPartParam multimodal input type (#10054)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
2024-11-06 05:13:09 +00:00
arakowsk-amd
ea928f608c
[Bugfix] Gpt-j-6B patch kv_scale to k_scale path (#10063)
Signed-off-by: Alex Rakowski <alex.rakowski@amd.com>
Signed-off-by: Alex Rakowski <182798202+arakowsk-amd@users.noreply.github.com>
2024-11-06 05:10:40 +00:00
Travis Johnson
2bcbae704c
[Bugfix] Fix edge-case crash when using chat with the Mistral Tekken Tokenizer (#10051)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-11-06 04:28:29 +00:00