Chen Zhang
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model ( #8811 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Simon Mo
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility ( #8760 ) ( #8810 )
2024-09-25 10:36:26 -07:00
Michael Goin
873edda6cf
[Misc] Support FP8 MoE for compressed-tensors ( #8588 )
2024-09-25 09:43:36 -07:00
科英
64840dfae4
[Frontend] MQLLMEngine supports profiling. ( #8761 )
2024-09-25 09:37:41 -07:00
Cyrus Leung
28e1299e60
rename PromptInputs and inputs with backward compatibility ( #8760 )
2024-09-25 09:36:47 -07:00
DefTruth
0c4d2ad5e6
[VLM][Bugfix] internvl with num_scheduler_steps > 1 ( #8614 )
2024-09-25 09:35:53 -07:00
bnellnm
300da09177
[Kernel] Fullgraph and opcheck tests ( #8479 )
2024-09-25 08:35:52 -06:00
Woo-Yeon Lee
8fae5ed7f6
[Misc] Fix minor typo in scheduler ( #8765 )
2024-09-25 00:53:03 -07:00
David Newman
3368c3ab36
[Bugfix] Ray 2.9.x doesn't expose available_resources_per_node ( #8767 )
...
Signed-off-by: darthhexx <darthhexx@gmail.com>
2024-09-25 00:52:26 -07:00
Adam Tilghman
1ac3de09cd
[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer ( #8672 )
2024-09-25 07:49:26 +00:00
sohamparikh
3e073e66f1
[Bugfix] load fc bias from config for eagle ( #8790 )
2024-09-24 23:16:30 -07:00
Isotr0py
c23953675f
[Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend ( #8770 )
2024-09-24 23:16:11 -07:00
zifeitong
e3dd0692fa
[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv ( #8250 )
2024-09-25 05:53:43 +00:00
Joe Runde
6e0c9d6bd0
[Bugfix] Use heartbeats instead of health checks ( #8583 )
2024-09-24 20:37:38 -07:00
Archit Patke
6da1ab6b41
[Core] Adding Priority Scheduling ( #5958 )
2024-09-24 19:50:50 -07:00
Travis Johnson
01b6f9e1f0
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding ( #8047 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-09-24 17:29:56 -07:00
Jee Jee Li
13f9f7a3d0
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 ( #8768 )
2024-09-24 17:08:55 -07:00
youkaichao
1e7d5c01f5
[misc] soft drop beam search ( #8763 )
2024-09-24 15:48:39 -07:00
Lucas Wilkinson
72fc97a0f1
[Bugfix] Fix torch dynamo fixes caused by replace_parameters ( #8748 )
2024-09-24 14:33:21 -04:00
Andy
2529d09b5a
[Frontend] Batch inference for llm.chat() API ( #8648 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-09-24 09:44:11 -07:00
Alex Brooks
8ff7ced996
[Model] Expose Phi3v num_crops as a mm_processor_kwarg ( #8658 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-24 07:36:46 +00:00
Peter Salas
3f06bae907
[Core][Model] Support loading weights by ID within models ( #7931 )
2024-09-24 07:14:15 +00:00
Cody Yu
b8747e8a7c
[MISC] Skip dumping inputs when unpicklable ( #8744 )
2024-09-24 06:10:03 +00:00
Simon Mo
3185fb0cca
Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" ( #8750 )
2024-09-24 05:45:20 +00:00
youkaichao
0250dd68c5
re-implement beam search on top of vllm core ( #8726 )
...
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
2024-09-23 22:08:12 -07:00
Alexander Matveev
1a2aef3e59
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse ( #8335 )
2024-09-23 15:38:04 -07:00
jiqing-feng
5f7bb58427
Fix typical acceptance sampler with correct recovered token ids ( #8562 )
2024-09-23 12:32:27 -07:00
Russell Bryant
b05f5c9238
[Core] Allow IPv6 in VLLM_HOST_IP with zmq ( #8575 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-23 12:15:41 -07:00
Jee Jee Li
9b0e3ec970
[Kernel][LoRA] Add assertion for punica sgmv kernels ( #7585 )
2024-09-23 18:57:42 +00:00
Lucas Wilkinson
86e9c8df29
[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin ( #7701 )
...
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-09-23 13:46:26 -04:00
Daniele
ee5f34b1c2
[CI/Build] use setuptools-scm to set __version__ ( #4738 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-23 09:44:26 -07:00
Jani Monoses
f2bd246c17
[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size ( #8707 )
2024-09-23 14:43:09 +00:00
Yanyi Liu
a79e522984
[Model] Support pp for qwen2-vl ( #8696 )
2024-09-23 13:46:59 +00:00
Li, Jiang
3e83c12b5c
[Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner ( #8733 )
2024-09-23 13:15:16 +00:00
Isotr0py
e551ca1555
[Hardware][CPU] Refactor CPU model runner ( #8729 )
2024-09-23 20:12:20 +08:00
Alex Brooks
9b8c8ba119
[Core][Frontend] Support Passing Multimodal Processor Kwargs ( #8657 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-09-23 07:44:48 +00:00
Lily Liu
c6bd70d772
[SpecDec][Misc] Cleanup, remove bonus token logic. ( #8701 )
2024-09-22 12:34:14 -07:00
litianjian
5b59532760
[Model][VLM] Add LLaVA-Onevision model support ( #8486 )
...
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-22 10:51:44 -07:00
Huazhong Ji
ca2b628b3c
[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler ( #8703 )
2024-09-22 10:44:09 -07:00
Cyrus Leung
06ed2815e2
[Model] Refactor BLIP/BLIP-2 to support composite model loading ( #8407 )
2024-09-22 12:24:21 +00:00
youkaichao
0e40ac9b7b
[ci][build] fix vllm-flash-attn ( #8699 )
2024-09-21 23:24:58 -07:00
Isotr0py
13d88d4137
[Bugfix] Refactor composite weight loading logic ( #8656 )
2024-09-22 04:33:27 +00:00
Divakar Verma
9dc7c6c7f3
[dbrx] refactor dbrx experts to extend FusedMoe class ( #8518 )
2024-09-21 15:09:39 -06:00
rasmith
ec4aaad812
[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 ( #8646 )
2024-09-21 09:20:54 +00:00
Cyrus Leung
5e85f4f82a
[VLM] Use SequenceData.from_token_counts to create dummy data ( #8687 )
2024-09-20 23:28:56 -07:00
Luka Govedič
71c60491f2
[Kernel] Build flash-attn from source ( #8245 )
2024-09-20 23:27:10 -07:00
Cyrus Leung
0455c46ed4
[Core] Factor out common code in SequenceData and Sequence ( #8675 )
2024-09-21 02:30:39 +00:00
Kunshang Ji
d4bf085ad0
[MISC] add support custom_op check ( #8557 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-20 19:03:55 -07:00
Cyrus Leung
0057894ef7
[Core] Rename PromptInputs and inputs( #8673 )
2024-09-20 19:00:54 -07:00
zyddnys
0f961b3ce9
[Bugfix] Fix incorrect llava next feature size calculation ( #8496 )
2024-09-20 22:48:32 +00:00