YiSheng5
d93d2d74fd
[XPU] Make pp group initilized for pipeline-parallelism ( #11648 )
...
Signed-off-by: yisheng <yi.sheng@intel.com>
2025-01-07 11:09:58 +08:00
Cyrus Leung
d0169e1b0f
[Model] Future-proof Qwen2-Audio multi-modal processor ( #11776 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-07 11:05:17 +08:00
Cyrus Leung
08fb75c72e
[Bugfix] Fix LLaVA-NeXT feature size precision error (for real) ( #11772 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-07 01:10:54 +00:00
Roger Wang
91b361ae89
[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision ( #11685 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-06 19:58:16 +00:00
Chen Zhang
e20c92bb61
[Kernel] Move attn_type to Attention.__init__() ( #11690 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-07 00:11:28 +08:00
Jee Jee Li
32c9eff2ff
[Bugfix][V1] Fix molmo text-only inputs ( #11676 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-06 15:22:25 +00:00
youkaichao
4ca5d40adc
[doc] explain how to add interleaving sliding window support ( #11771 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-06 21:57:44 +08:00
Roger Wang
9279b9f83d
[Bugfix] Fix max image size for LLaVA-Onevision ( #11769 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-01-06 13:48:53 +00:00
Cyrus Leung
ee77fdb5de
[Doc][2/N] Reorganize Models and Usage sections ( #11755 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-06 21:40:31 +08:00
Cyrus Leung
996357e480
[VLM] Separate out profiling-related logic ( #11746 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-06 16:02:21 +08:00
Suraj Deshmukh
2a622d704a
k8s-config: Update the secret to use stringData ( #11679 )
...
Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
2025-01-06 08:01:22 +00:00
Lucas Tucker
9c749713f6
[mypy] Forward pass function type hints in lora ( #11740 )
...
Signed-off-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
2025-01-06 07:59:36 +00:00
Rui Qiao
022c5c6944
[V1] Refactor get_executor_cls ( #11754 )
2025-01-06 07:59:16 +00:00
Rui Qiao
f8fcca100b
[Misc] Fix typo for valid_tool_parses ( #11753 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-01-06 07:12:38 +00:00
Woosuk Kwon
06bfb51963
[V1] Add BlockTable class ( #11693 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-06 14:24:42 +09:00
Cody Yu
408e560015
[Bugfix] Remove block size constraint ( #11723 )
2025-01-06 12:49:55 +08:00
Cyrus Leung
402d378360
[Doc] [1/N] Reorganize Getting Started section ( #11645 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-06 02:18:33 +00:00
cennn
9e764e7b10
[distributed] remove pynccl's redundant change_state ( #11749 )
2025-01-06 09:05:48 +08:00
Robert Shaw
33fc1e2e86
[Frontend] Improve StreamingResponse Exception Handling ( #11752 )
2025-01-05 16:35:01 -05:00
Lancer
eba17173d3
fix: [doc] fix typo ( #11751 )
...
Co-authored-by: Lancer <maruixiang6688@gmail.com>
2025-01-06 00:48:16 +08:00
cennn
635b897246
[distributed] remove pynccl's redundant stream ( #11744 )
2025-01-05 23:09:11 +08:00
Lu Fang
4068f4b5b5
[MISC] Replace c10::optional with std::optional ( #11730 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-01-05 10:20:34 +09:00
Jee Jee Li
47831430cc
[Bugfix][V1] Fix test_kv_cache_utils.py ( #11738 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-04 16:07:59 +00:00
Cyrus Leung
65c08928c2
[Model] Remove unnecessary weight initialization logic ( #11736 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-04 23:46:21 +08:00
Cyrus Leung
ba214dffbe
[Bugfix] Fix precision error in LLaVA-NeXT ( #11735 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-04 23:45:57 +08:00
Cyrus Leung
eed11ebee9
[VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision ( #11717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-04 11:40:53 +00:00
Yan Burman
300acb8347
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture ( #11233 )
...
Signed-off-by: Yan Burman <yanburman@users.noreply.github.com>
Signed-off-by: Ido Asraff <idoa@atero.ai>
2025-01-04 14:50:16 +08:00
xcnick
d91457d529
[V1] Add kv cache utils tests. ( #11513 )
...
Signed-off-by: xcnick <xcnick0412@gmail.com>
2025-01-04 14:49:46 +08:00
Kunshang Ji
fbf2564554
[V1] Add RayExecutor support for AsyncLLM (api server) ( #11712 )
2025-01-04 06:41:31 +00:00
Alberto Ferrer
d1d49397e7
Update bnb.md with example for OpenAI ( #11718 )
2025-01-04 06:29:02 +00:00
Hust_YangXian
9c93636d84
Update tool_calling.md ( #11701 )
2025-01-04 06:16:30 +00:00
WangErXiao
e5d7ed0c53
[V1] log GPU blocks num for MultiprocExecutor ( #11656 )
2025-01-04 00:13:12 +00:00
Robert Shaw
ad0d567e1c
[V1] Chore: cruft removal ( #11724 )
2025-01-03 23:25:02 +00:00
Michael Goin
bf0d97d786
Update requirements-tpu.txt to support python 3.9 and 3.11 ( #11695 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-01-03 22:36:46 +00:00
Jee Jee Li
a655eb3025
[Misc]Add BNB quantization for Qwen2VL ( #11719 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-03 15:19:02 -07:00
Robert Shaw
1543914c04
[V1] Improve TP>1 Error Handling + Stack Trace ( #11721 )
...
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-03 21:29:11 +00:00
ZincCat
61fed92c7e
[Bugfix] Fix ColumnParallelLinearWithLoRA slice ( #11708 )
...
Signed-off-by: ZincCat <zincchloride@outlook.com>
2025-01-03 21:02:34 +00:00
Robert Shaw
80c751e7f6
[V1] Simplify Shutdown ( #11659 )
2025-01-03 17:25:38 +00:00
Aurick Qiao
e1a5c2f0a1
[Model] Whisper model implementation ( #11280 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
2025-01-03 16:39:19 +08:00
Kevin H. Luu
fd3a62a122
[perf-benchmark] Fix dependency for steps in benchmark pipeline ( #11710 )
2025-01-02 22:38:37 -08:00
Lu Fang
07064cb1d4
[Bugfix] Check chain_speculative_sampling before calling it ( #11673 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-01-02 16:58:56 -08:00
Sachin Varghese
2f1e8e8f54
Update default max_num_batch_tokens for chunked prefill ( #11694 )
2025-01-03 00:25:53 +00:00
Nathan Azrak
68d37809b9
[Misc] Minimum requirements for SageMaker compatibility ( #11576 )
2025-01-02 15:59:25 -08:00
wchen61
5dba257506
Resolve race conditions in Marlin kernel ( #11493 )
...
Signed-off-by: wchen61 <wchen61@foxmail.com>
2025-01-02 22:58:56 +00:00
bjmsong
187e32997c
[Bugfix] Change kv scaling factor by param json on nvidia gpu ( #11688 )
...
Signed-off-by: bjmsong <bjmsong@126.com>
Co-authored-by: bjmsong <bjmsong@126.com>
2025-01-02 21:11:39 +00:00
Woosuk Kwon
b55ed6ef8a
[V1][Minor] Optimize token_ids_cpu copy ( #11692 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-02 12:04:58 -07:00
Kathy Yu
2f385183f3
[Bugfix] Free cross attention block table for preempted-for-recompute sequence group. ( #10013 )
...
Signed-off-by: Kathy Yu <feiyangyu@google.com>
2025-01-02 10:28:09 -08:00
Chunyang Wen
84c35c374a
According to vllm.EngineArgs, the name should be distributed_executor_backend ( #11689 )
2025-01-02 18:14:16 +00:00
Cyrus Leung
8c38ee7007
[VLM] Merged multi-modal processor for LLaVA-NeXT ( #11682 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-02 16:39:27 +00:00
Tobias Pitters
b6087a6bee
[mypy] Pass type checking in vllm/inputs ( #11680 )
...
Signed-off-by: Tobias Pitters <tobias.pitters@gmail.com>
2025-01-02 16:18:15 +00:00