Cyrus Leung
|
ee77fdb5de
|
[Doc][2/N] Reorganize Models and Usage sections (#11755)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-06 21:40:31 +08:00 |
|
Cyrus Leung
|
996357e480
|
[VLM] Separate out profiling-related logic (#11746)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-06 16:02:21 +08:00 |
|
Suraj Deshmukh
|
2a622d704a
|
k8s-config: Update the secret to use stringData (#11679)
Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
|
2025-01-06 08:01:22 +00:00 |
|
Lucas Tucker
|
9c749713f6
|
[mypy] Forward pass function type hints in lora (#11740)
Signed-off-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
|
2025-01-06 07:59:36 +00:00 |
|
Rui Qiao
|
022c5c6944
|
[V1] Refactor get_executor_cls (#11754)
|
2025-01-06 07:59:16 +00:00 |
|
Rui Qiao
|
f8fcca100b
|
[Misc] Fix typo for valid_tool_parses (#11753)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-01-06 07:12:38 +00:00 |
|
Woosuk Kwon
|
06bfb51963
|
[V1] Add BlockTable class (#11693)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-06 14:24:42 +09:00 |
|
Cody Yu
|
408e560015
|
[Bugfix] Remove block size constraint (#11723)
|
2025-01-06 12:49:55 +08:00 |
|
Cyrus Leung
|
402d378360
|
[Doc] [1/N] Reorganize Getting Started section (#11645)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-06 02:18:33 +00:00 |
|
cennn
|
9e764e7b10
|
[distributed] remove pynccl's redundant change_state (#11749)
|
2025-01-06 09:05:48 +08:00 |
|
Robert Shaw
|
33fc1e2e86
|
[Frontend] Improve StreamingResponse Exception Handling (#11752)
|
2025-01-05 16:35:01 -05:00 |
|
Lancer
|
eba17173d3
|
fix: [doc] fix typo (#11751)
Co-authored-by: Lancer <maruixiang6688@gmail.com>
|
2025-01-06 00:48:16 +08:00 |
|
cennn
|
635b897246
|
[distributed] remove pynccl's redundant stream (#11744)
|
2025-01-05 23:09:11 +08:00 |
|
Lu Fang
|
4068f4b5b5
|
[MISC] Replace c10::optional with std::optional (#11730)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-01-05 10:20:34 +09:00 |
|
Jee Jee Li
|
47831430cc
|
[Bugfix][V1] Fix test_kv_cache_utils.py (#11738)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-04 16:07:59 +00:00 |
|
Cyrus Leung
|
65c08928c2
|
[Model] Remove unnecessary weight initialization logic (#11736)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-04 23:46:21 +08:00 |
|
Cyrus Leung
|
ba214dffbe
|
[Bugfix] Fix precision error in LLaVA-NeXT (#11735)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-04 23:45:57 +08:00 |
|
Cyrus Leung
|
eed11ebee9
|
[VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision (#11717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-04 11:40:53 +00:00 |
|
Yan Burman
|
300acb8347
|
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233)
Signed-off-by: Yan Burman <yanburman@users.noreply.github.com>
Signed-off-by: Ido Asraff <idoa@atero.ai>
|
2025-01-04 14:50:16 +08:00 |
|
xcnick
|
d91457d529
|
[V1] Add kv cache utils tests. (#11513)
Signed-off-by: xcnick <xcnick0412@gmail.com>
|
2025-01-04 14:49:46 +08:00 |
|
Kunshang Ji
|
fbf2564554
|
[V1] Add RayExecutor support for AsyncLLM (api server) (#11712)
|
2025-01-04 06:41:31 +00:00 |
|
Alberto Ferrer
|
d1d49397e7
|
Update bnb.md with example for OpenAI (#11718)
|
2025-01-04 06:29:02 +00:00 |
|
Hust_YangXian
|
9c93636d84
|
Update tool_calling.md (#11701)
|
2025-01-04 06:16:30 +00:00 |
|
WangErXiao
|
e5d7ed0c53
|
[V1] log GPU blocks num for MultiprocExecutor (#11656)
|
2025-01-04 00:13:12 +00:00 |
|
Robert Shaw
|
ad0d567e1c
|
[V1] Chore: cruft removal (#11724)
|
2025-01-03 23:25:02 +00:00 |
|
Michael Goin
|
bf0d97d786
|
Update requirements-tpu.txt to support python 3.9 and 3.11 (#11695)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-01-03 22:36:46 +00:00 |
|
Jee Jee Li
|
a655eb3025
|
[Misc]Add BNB quantization for Qwen2VL (#11719)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-03 15:19:02 -07:00 |
|
Robert Shaw
|
1543914c04
|
[V1] Improve TP>1 Error Handling + Stack Trace (#11721)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-01-03 21:29:11 +00:00 |
|
ZincCat
|
61fed92c7e
|
[Bugfix] Fix ColumnParallelLinearWithLoRA slice (#11708)
Signed-off-by: ZincCat <zincchloride@outlook.com>
|
2025-01-03 21:02:34 +00:00 |
|
Robert Shaw
|
80c751e7f6
|
[V1] Simplify Shutdown (#11659)
|
2025-01-03 17:25:38 +00:00 |
|
Aurick Qiao
|
e1a5c2f0a1
|
[Model] Whisper model implementation (#11280)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2025-01-03 16:39:19 +08:00 |
|
Kevin H. Luu
|
fd3a62a122
|
[perf-benchmark] Fix dependency for steps in benchmark pipeline (#11710)
|
2025-01-02 22:38:37 -08:00 |
|
Lu Fang
|
07064cb1d4
|
[Bugfix] Check chain_speculative_sampling before calling it (#11673)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-01-02 16:58:56 -08:00 |
|
Sachin Varghese
|
2f1e8e8f54
|
Update default max_num_batch_tokens for chunked prefill (#11694)
|
2025-01-03 00:25:53 +00:00 |
|
Nathan Azrak
|
68d37809b9
|
[Misc] Minimum requirements for SageMaker compatibility (#11576)
|
2025-01-02 15:59:25 -08:00 |
|
wchen61
|
5dba257506
|
Resolve race conditions in Marlin kernel (#11493)
Signed-off-by: wchen61 <wchen61@foxmail.com>
|
2025-01-02 22:58:56 +00:00 |
|
bjmsong
|
187e32997c
|
[Bugfix] Change kv scaling factor by param json on nvidia gpu (#11688)
Signed-off-by: bjmsong <bjmsong@126.com>
Co-authored-by: bjmsong <bjmsong@126.com>
|
2025-01-02 21:11:39 +00:00 |
|
Woosuk Kwon
|
b55ed6ef8a
|
[V1][Minor] Optimize token_ids_cpu copy (#11692)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-02 12:04:58 -07:00 |
|
Kathy Yu
|
2f385183f3
|
[Bugfix] Free cross attention block table for preempted-for-recompute sequence group. (#10013)
Signed-off-by: Kathy Yu <feiyangyu@google.com>
|
2025-01-02 10:28:09 -08:00 |
|
Chunyang Wen
|
84c35c374a
|
According to vllm.EngineArgs, the name should be distributed_executor_backend (#11689)
|
2025-01-02 18:14:16 +00:00 |
|
Cyrus Leung
|
8c38ee7007
|
[VLM] Merged multi-modal processor for LLaVA-NeXT (#11682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-02 16:39:27 +00:00 |
|
Tobias Pitters
|
b6087a6bee
|
[mypy] Pass type checking in vllm/inputs (#11680)
Signed-off-by: Tobias Pitters <tobias.pitters@gmail.com>
|
2025-01-02 16:18:15 +00:00 |
|
Cyrus Leung
|
23c1b10a4c
|
[VLM][Bugfix] Multi-modal processor compatible with V1 multi-input (#11674)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-02 17:00:00 +08:00 |
|
Cyrus Leung
|
a115ac46b5
|
[VLM] Move supported limits and max tokens to merged multi-modal processor (#11669)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-01 15:44:42 +00:00 |
|
Woosuk Kwon
|
73001445fb
|
[V1] Implement Cascade Attention (#11635)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-01 21:56:46 +09:00 |
|
Kazuhiro Serizawa
|
6d70198b17
|
[Doc] Fix typo (#11666)
Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com>
|
2025-01-01 08:10:10 +00:00 |
|
Lu Fang
|
f962f426bc
|
[Misc] Replace space with - in the file names (#11667)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-01-01 07:39:30 +00:00 |
|
Jee Jee Li
|
11d8a091c6
|
[Misc] Optimize Qwen2-VL LoRA test (#11663)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-01 14:42:23 +08:00 |
|
Cyrus Leung
|
365801fedd
|
[VLM] Add max-count checking in data parser for single image models (#11661)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-31 22:15:21 -08:00 |
|
Joe Runde
|
4db72e57f6
|
[Bugfix][Refactor] Unify model management in frontend (#11660)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-01-01 02:21:51 +00:00 |
|