2881 Commits

Author SHA1 Message Date
ElizaWszola
d081da0064
[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-09-28 18:19:40 -07:00
sroy745
5bf8789b2a
[Bugfix] Block manager v2 with preemption and lookahead slots (#8824) 2024-09-29 09:17:45 +08:00
Russell Bryant
d1537039ce
[Core] Improve choice of Python multiprocessing method (#8823)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-29 09:17:07 +08:00
youkaichao
cc276443b5
[doc] organize installation doc and expose per-commit docker (#8931) 2024-09-28 17:48:41 -07:00
Chen Zhang
e585b583a9
[Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 (#8891) 2024-09-28 18:51:22 +00:00
Edouard B.
090e945e36
[Frontend] Make beam search emulator temperature modifiable (#8928)
Co-authored-by: Eduard Balzin <nfunctor@yahoo.fr>
2024-09-28 11:30:21 -07:00
Cyrus Leung
e1a3f5e831
[CI/Build] Update models tests & examples (#8874)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-28 09:54:35 -07:00
Varun Sundar Rabindranath
19d02ff938
[Bugfix] Fix PP for Multi-Step (#8887) 2024-09-28 08:52:46 -07:00
tastelikefeet
39d3f8d94f
[Bugfix] Fix code for downloading models from modelscope (#8443) 2024-09-28 08:24:12 -07:00
Cyrus Leung
b0298aa8cc
[Misc] Remove vLLM patch of BaichuanTokenizer (#8921) 2024-09-28 08:11:25 +00:00
Tyler Titsworth
260024a374
[Bugfix][Intel] Fix XPU Dockerfile Build (#7824)
Signed-off-by: tylertitsworth <tyler.titsworth@intel.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-27 23:45:50 -07:00
youkaichao
d86f6b2afb
[misc] fix wheel name (#8919) 2024-09-27 22:10:44 -07:00
Sebastian Schoennenbeck
bd429f2b75
[Core] Priority-based scheduling in async engine (#8850) 2024-09-27 15:07:10 -07:00
youkaichao
18e60d7d13
[misc][distributed] add VLLM_SKIP_P2P_CHECK flag (#8911) 2024-09-27 14:27:56 -07:00
Varun Sundar Rabindranath
c2ec430ab5
[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-09-27 13:32:07 -07:00
Lucas Wilkinson
c5d55356f9
[Bugfix] fix for deepseek w4a16 (#8906)
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-09-27 13:12:34 -06:00
Luka Govedič
172d1cd276
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271) 2024-09-27 14:25:10 -04:00
youkaichao
a9b15c606f
[torch.compile] use empty tensor instead of None for profiling (#8875) 2024-09-27 08:11:32 -07:00
Brittany
8df2dc3c88
[TPU] Update pallas.py to support trillium (#8871) 2024-09-27 01:16:55 -07:00
Isotr0py
6d792d2f31
[Bugfix][VLM] Fix Fuyu batching inference with max_num_seqs>1 (#8892) 2024-09-27 01:15:58 -07:00
Peter Pan
0e088750af
[MISC] Fix invalid escape sequence '\' (#8830)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2024-09-27 01:13:25 -07:00
youkaichao
dc4e3df5c2
[misc] fix collect env (#8894) 2024-09-27 00:26:38 -07:00
Cyrus Leung
3b00b9c26c
[Core] renamePromptInputs and inputs (#8876) 2024-09-26 20:35:15 -07:00
Maximilien de Bayser
344cd2b6f4
[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-09-26 17:01:42 -07:00
Cyrus Leung
1b49148e47
[Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility (#8764) 2024-09-26 16:54:09 -07:00
Nick Hill
4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) 2024-09-26 16:46:43 -07:00
Tyler Michael Smith
71d21c73ab
[Bugfix] Fixup advance_step.cu warning (#8815) 2024-09-26 16:23:45 -07:00
Chirag Jain
ee2da3e9ef
fix validation: Only set tool_choice auto if at least one tool is provided (#8568) 2024-09-26 16:23:17 -07:00
Tyler Michael Smith
e2f6f26e86
[Bugfix] Fix print_warning_once's line info (#8867) 2024-09-26 16:18:26 -07:00
Michael Goin
b28d2104de
[Misc] Change dummy profiling and BOS fallback warns to log once (#8820) 2024-09-26 16:18:14 -07:00
Pernekhan Utemuratov
93d364da34
[Bugfix] Include encoder prompts len to non-stream api usage response (#8861) 2024-09-26 15:47:00 -07:00
Kevin H. Luu
d9cfbc891e
[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872)
Signed-off-by: kevin <kevin@anyscale.com>
2024-09-26 15:02:16 -07:00
youkaichao
70de39f6b4
[misc][installation] build from source without compilation (#8818) 2024-09-26 13:19:04 -07:00
fyuan1316
68988d4e0d
[CI/Build] Fix missing ci dependencies (#8834) 2024-09-26 11:04:39 -07:00
Michael Goin
520db4dbc1
[Docs] Add README to the build docker image (#8825) 2024-09-26 11:02:52 -07:00
Tyler Michael Smith
f70bccac75
[Build/CI] Upgrade to gcc 10 in the base build Docker image (#8814) 2024-09-26 10:07:18 -07:00
Roger Wang
4bb98f2190
[Misc] Update config loading for Qwen2-VL and remove Granite (#8837) 2024-09-26 07:45:30 -07:00
Michael Goin
7193774b1f
[Misc] Support quantization of MllamaForCausalLM (#8822) v0.6.2 2024-09-25 14:46:22 -07:00
Roger Wang
e2c6e0a829
[Doc] Update doc for Transformers 4.45 (#8817) 2024-09-25 13:29:48 -07:00
Chen Zhang
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Simon Mo
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810) 2024-09-25 10:36:26 -07:00
Michael Goin
873edda6cf
[Misc] Support FP8 MoE for compressed-tensors (#8588) 2024-09-25 09:43:36 -07:00
科英
64840dfae4
[Frontend] MQLLMEngine supports profiling. (#8761) 2024-09-25 09:37:41 -07:00
Cyrus Leung
28e1299e60
rename PromptInputs and inputs with backward compatibility (#8760) 2024-09-25 09:36:47 -07:00
DefTruth
0c4d2ad5e6
[VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614) 2024-09-25 09:35:53 -07:00
Jee Jee Li
c6f2485c82
[[Misc]] Add extra deps for openai server image (#8792) 2024-09-25 09:35:23 -07:00
bnellnm
300da09177
[Kernel] Fullgraph and opcheck tests (#8479) 2024-09-25 08:35:52 -06:00
Hongxia Yang
1c046447a6
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777) 2024-09-25 22:26:37 +08:00
Woo-Yeon Lee
8fae5ed7f6
[Misc] Fix minor typo in scheduler (#8765) 2024-09-25 00:53:03 -07:00
David Newman
3368c3ab36
[Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767)
Signed-off-by: darthhexx <darthhexx@gmail.com>
2024-09-25 00:52:26 -07:00