Cyrus Leung
|
eed11ebee9
|
[VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision (#11717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-04 11:40:53 +00:00 |
|
Yan Burman
|
300acb8347
|
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233)
Signed-off-by: Yan Burman <yanburman@users.noreply.github.com>
Signed-off-by: Ido Asraff <idoa@atero.ai>
|
2025-01-04 14:50:16 +08:00 |
|
xcnick
|
d91457d529
|
[V1] Add kv cache utils tests. (#11513)
Signed-off-by: xcnick <xcnick0412@gmail.com>
|
2025-01-04 14:49:46 +08:00 |
|
Kunshang Ji
|
fbf2564554
|
[V1] Add RayExecutor support for AsyncLLM (api server) (#11712)
|
2025-01-04 06:41:31 +00:00 |
|
Alberto Ferrer
|
d1d49397e7
|
Update bnb.md with example for OpenAI (#11718)
|
2025-01-04 06:29:02 +00:00 |
|
Hust_YangXian
|
9c93636d84
|
Update tool_calling.md (#11701)
|
2025-01-04 06:16:30 +00:00 |
|
WangErXiao
|
e5d7ed0c53
|
[V1] log GPU blocks num for MultiprocExecutor (#11656)
|
2025-01-04 00:13:12 +00:00 |
|
Robert Shaw
|
ad0d567e1c
|
[V1] Chore: cruft removal (#11724)
|
2025-01-03 23:25:02 +00:00 |
|
Michael Goin
|
bf0d97d786
|
Update requirements-tpu.txt to support python 3.9 and 3.11 (#11695)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-01-03 22:36:46 +00:00 |
|
Jee Jee Li
|
a655eb3025
|
[Misc]Add BNB quantization for Qwen2VL (#11719)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-03 15:19:02 -07:00 |
|
Robert Shaw
|
1543914c04
|
[V1] Improve TP>1 Error Handling + Stack Trace (#11721)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-01-03 21:29:11 +00:00 |
|
ZincCat
|
61fed92c7e
|
[Bugfix] Fix ColumnParallelLinearWithLoRA slice (#11708)
Signed-off-by: ZincCat <zincchloride@outlook.com>
|
2025-01-03 21:02:34 +00:00 |
|
Robert Shaw
|
80c751e7f6
|
[V1] Simplify Shutdown (#11659)
|
2025-01-03 17:25:38 +00:00 |
|
Aurick Qiao
|
e1a5c2f0a1
|
[Model] Whisper model implementation (#11280)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2025-01-03 16:39:19 +08:00 |
|
Kevin H. Luu
|
fd3a62a122
|
[perf-benchmark] Fix dependency for steps in benchmark pipeline (#11710)
|
2025-01-02 22:38:37 -08:00 |
|
Lu Fang
|
07064cb1d4
|
[Bugfix] Check chain_speculative_sampling before calling it (#11673)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-01-02 16:58:56 -08:00 |
|
Sachin Varghese
|
2f1e8e8f54
|
Update default max_num_batch_tokens for chunked prefill (#11694)
|
2025-01-03 00:25:53 +00:00 |
|
Nathan Azrak
|
68d37809b9
|
[Misc] Minimum requirements for SageMaker compatibility (#11576)
|
2025-01-02 15:59:25 -08:00 |
|
wchen61
|
5dba257506
|
Resolve race conditions in Marlin kernel (#11493)
Signed-off-by: wchen61 <wchen61@foxmail.com>
|
2025-01-02 22:58:56 +00:00 |
|
bjmsong
|
187e32997c
|
[Bugfix] Change kv scaling factor by param json on nvidia gpu (#11688)
Signed-off-by: bjmsong <bjmsong@126.com>
Co-authored-by: bjmsong <bjmsong@126.com>
|
2025-01-02 21:11:39 +00:00 |
|
Woosuk Kwon
|
b55ed6ef8a
|
[V1][Minor] Optimize token_ids_cpu copy (#11692)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-02 12:04:58 -07:00 |
|
Kathy Yu
|
2f385183f3
|
[Bugfix] Free cross attention block table for preempted-for-recompute sequence group. (#10013)
Signed-off-by: Kathy Yu <feiyangyu@google.com>
|
2025-01-02 10:28:09 -08:00 |
|
Chunyang Wen
|
84c35c374a
|
According to vllm.EngineArgs, the name should be distributed_executor_backend (#11689)
|
2025-01-02 18:14:16 +00:00 |
|
Cyrus Leung
|
8c38ee7007
|
[VLM] Merged multi-modal processor for LLaVA-NeXT (#11682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-02 16:39:27 +00:00 |
|
Tobias Pitters
|
b6087a6bee
|
[mypy] Pass type checking in vllm/inputs (#11680)
Signed-off-by: Tobias Pitters <tobias.pitters@gmail.com>
|
2025-01-02 16:18:15 +00:00 |
|
Cyrus Leung
|
23c1b10a4c
|
[VLM][Bugfix] Multi-modal processor compatible with V1 multi-input (#11674)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-02 17:00:00 +08:00 |
|
Cyrus Leung
|
a115ac46b5
|
[VLM] Move supported limits and max tokens to merged multi-modal processor (#11669)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-01 15:44:42 +00:00 |
|
Woosuk Kwon
|
73001445fb
|
[V1] Implement Cascade Attention (#11635)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-01 21:56:46 +09:00 |
|
Kazuhiro Serizawa
|
6d70198b17
|
[Doc] Fix typo (#11666)
Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com>
|
2025-01-01 08:10:10 +00:00 |
|
Lu Fang
|
f962f426bc
|
[Misc] Replace space with - in the file names (#11667)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-01-01 07:39:30 +00:00 |
|
Jee Jee Li
|
11d8a091c6
|
[Misc] Optimize Qwen2-VL LoRA test (#11663)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-01 14:42:23 +08:00 |
|
Cyrus Leung
|
365801fedd
|
[VLM] Add max-count checking in data parser for single image models (#11661)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-31 22:15:21 -08:00 |
|
Joe Runde
|
4db72e57f6
|
[Bugfix][Refactor] Unify model management in frontend (#11660)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-01-01 02:21:51 +00:00 |
|
Yihua Cheng
|
0c6f998554
|
[Benchmark] Add benchmark script for CPU offloading (#11533)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-01-01 00:10:55 +00:00 |
|
Roger Wang
|
e7c7c5e822
|
[V1][VLM] V1 support for selected single-image models. (#11632)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-31 21:17:22 +00:00 |
|
Chen Zhang
|
8c3230d8c1
|
[V1] Simpify vision block hash for prefix caching by removing offset from hash (#11646)
|
2024-12-31 08:56:01 +00:00 |
|
sakunkun
|
2c5718809b
|
[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (#11565)
|
2024-12-31 06:29:04 +00:00 |
|
John Giorgi
|
82c49d3260
|
[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (#6909)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-30 22:15:58 -08:00 |
|
Michael Goin
|
74fa1d123c
|
[Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-31 03:43:54 +00:00 |
|
Matthias Vogler
|
a2a40bcd0d
|
[Model][LoRA]LoRA support added for MolmoForCausalLM (#11439)
Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-30 17:33:06 -08:00 |
|
Kevin H. Luu
|
ccb1aabcca
|
[benchmark] Remove dependency for H100 benchmark step (#11572)
|
2024-12-30 12:27:07 -08:00 |
|
whyiug
|
36e7670045
|
[Bugfix] Validate and concatenate image embeddings in MiniCPMVBaseModel (#11631)
|
2024-12-30 18:51:04 +00:00 |
|
Robert Shaw
|
5886aa496e
|
[V1] [6/N] API Server: Better Shutdown (#11586)
|
2024-12-30 15:51:02 +00:00 |
|
Cyrus Leung
|
8d9b6721e7
|
[VLM] Abstract out multi-modal data parsing in merged processor (#11620)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-30 15:01:35 +00:00 |
|
youkaichao
|
b12e87f942
|
[platforms] enable platform plugins (#11602)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-30 20:24:45 +08:00 |
|
Li, Jiang
|
5dbf854553
|
[CI/Build][CPU] Fix CPU CI by lazy importing triton FP8 kernels (#11618)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-12-30 10:17:04 +00:00 |
|
Tyler Michael Smith
|
970d6d0776
|
[Build][Kernel] Update CUTLASS to v3.6.0 (#11607)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-30 17:22:13 +08:00 |
|
Liangfu Chen
|
628ec6c17b
|
[Docker] bump up neuron sdk v2.21 (#11593)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2024-12-30 13:46:14 +08:00 |
|
youkaichao
|
3682e33f9f
|
[v1] fix compilation cache (#11598)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-30 04:24:12 +00:00 |
|
Michael Goin
|
0aa38d16f5
|
Remove print statement in DeepseekScalingRotaryEmbedding (#11604)
|
2024-12-29 20:16:46 +00:00 |
|