xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-24 02:17:53 +08:00

Author	SHA1	Message	Date
Cyrus Leung	eed11ebee9	[VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision (#11717 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-04 11:40:53 +00:00
Yan Burman	300acb8347	[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233 ) Signed-off-by: Yan Burman <yanburman@users.noreply.github.com> Signed-off-by: Ido Asraff <idoa@atero.ai>	2025-01-04 14:50:16 +08:00
xcnick	d91457d529	[V1] Add kv cache utils tests. (#11513 ) Signed-off-by: xcnick <xcnick0412@gmail.com>	2025-01-04 14:49:46 +08:00
Kunshang Ji	fbf2564554	[V1] Add `RayExecutor` support for `AsyncLLM` (api server) (#11712 )	2025-01-04 06:41:31 +00:00
Alberto Ferrer	d1d49397e7	Update bnb.md with example for OpenAI (#11718 )	2025-01-04 06:29:02 +00:00
Hust_YangXian	9c93636d84	Update tool_calling.md (#11701 )	2025-01-04 06:16:30 +00:00
WangErXiao	e5d7ed0c53	[V1] log GPU blocks num for MultiprocExecutor (#11656 )	2025-01-04 00:13:12 +00:00
Robert Shaw	ad0d567e1c	[V1] Chore: cruft removal (#11724 )	2025-01-03 23:25:02 +00:00
Michael Goin	bf0d97d786	Update requirements-tpu.txt to support python 3.9 and 3.11 (#11695 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-01-03 22:36:46 +00:00
Jee Jee Li	a655eb3025	[Misc]Add BNB quantization for Qwen2VL (#11719 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-03 15:19:02 -07:00
Robert Shaw	1543914c04	[V1] Improve TP>1 Error Handling + Stack Trace (#11721 ) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-01-03 21:29:11 +00:00
ZincCat	61fed92c7e	[Bugfix] Fix ColumnParallelLinearWithLoRA slice (#11708 ) Signed-off-by: ZincCat <zincchloride@outlook.com>	2025-01-03 21:02:34 +00:00
Robert Shaw	80c751e7f6	[V1] Simplify Shutdown (#11659 )	2025-01-03 17:25:38 +00:00
Aurick Qiao	e1a5c2f0a1	[Model] Whisper model implementation (#11280 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>	2025-01-03 16:39:19 +08:00
Kevin H. Luu	fd3a62a122	[perf-benchmark] Fix dependency for steps in benchmark pipeline (#11710 )	2025-01-02 22:38:37 -08:00
Lu Fang	07064cb1d4	[Bugfix] Check chain_speculative_sampling before calling it (#11673 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-01-02 16:58:56 -08:00
Sachin Varghese	2f1e8e8f54	Update default max_num_batch_tokens for chunked prefill (#11694 )	2025-01-03 00:25:53 +00:00
Nathan Azrak	68d37809b9	[Misc] Minimum requirements for SageMaker compatibility (#11576 )	2025-01-02 15:59:25 -08:00
wchen61	5dba257506	Resolve race conditions in Marlin kernel (#11493 ) Signed-off-by: wchen61 <wchen61@foxmail.com>	2025-01-02 22:58:56 +00:00
bjmsong	187e32997c	[Bugfix] Change kv scaling factor by param json on nvidia gpu (#11688 ) Signed-off-by: bjmsong <bjmsong@126.com> Co-authored-by: bjmsong <bjmsong@126.com>	2025-01-02 21:11:39 +00:00
Woosuk Kwon	b55ed6ef8a	[V1][Minor] Optimize token_ids_cpu copy (#11692 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-02 12:04:58 -07:00
Kathy Yu	2f385183f3	[Bugfix] Free cross attention block table for preempted-for-recompute sequence group. (#10013 ) Signed-off-by: Kathy Yu <feiyangyu@google.com>	2025-01-02 10:28:09 -08:00
Chunyang Wen	84c35c374a	According to vllm.EngineArgs, the name should be distributed_executor_backend (#11689 )	2025-01-02 18:14:16 +00:00
Cyrus Leung	8c38ee7007	[VLM] Merged multi-modal processor for LLaVA-NeXT (#11682 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-02 16:39:27 +00:00
Tobias Pitters	b6087a6bee	[mypy] Pass type checking in vllm/inputs (#11680 ) Signed-off-by: Tobias Pitters <tobias.pitters@gmail.com>	2025-01-02 16:18:15 +00:00
Cyrus Leung	23c1b10a4c	[VLM][Bugfix] Multi-modal processor compatible with V1 multi-input (#11674 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-02 17:00:00 +08:00
Cyrus Leung	a115ac46b5	[VLM] Move supported limits and max tokens to merged multi-modal processor (#11669 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-01 15:44:42 +00:00
Woosuk Kwon	73001445fb	[V1] Implement Cascade Attention (#11635 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-01 21:56:46 +09:00
Kazuhiro Serizawa	6d70198b17	[Doc] Fix typo (#11666 ) Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com>	2025-01-01 08:10:10 +00:00
Lu Fang	f962f426bc	[Misc] Replace space with - in the file names (#11667 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-01-01 07:39:30 +00:00
Jee Jee Li	11d8a091c6	[Misc] Optimize Qwen2-VL LoRA test (#11663 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-01 14:42:23 +08:00
Cyrus Leung	365801fedd	[VLM] Add max-count checking in data parser for single image models (#11661 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-31 22:15:21 -08:00
Joe Runde	4db72e57f6	[Bugfix][Refactor] Unify model management in frontend (#11660 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-01-01 02:21:51 +00:00
Yihua Cheng	0c6f998554	[Benchmark] Add benchmark script for CPU offloading (#11533 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: KuntaiDu <kuntai@uchicago.edu>	2025-01-01 00:10:55 +00:00
Roger Wang	e7c7c5e822	[V1][VLM] V1 support for selected single-image models. (#11632 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-12-31 21:17:22 +00:00
Chen Zhang	8c3230d8c1	[V1] Simpify vision block hash for prefix caching by removing offset from hash (#11646 )	2024-12-31 08:56:01 +00:00
sakunkun	2c5718809b	[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (#11565 )	2024-12-31 06:29:04 +00:00
John Giorgi	82c49d3260	[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (#6909 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-30 22:15:58 -08:00
Michael Goin	74fa1d123c	[Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-31 03:43:54 +00:00
Matthias Vogler	a2a40bcd0d	[Model][LoRA]LoRA support added for MolmoForCausalLM (#11439 ) Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-30 17:33:06 -08:00
Kevin H. Luu	ccb1aabcca	[benchmark] Remove dependency for H100 benchmark step (#11572 )	2024-12-30 12:27:07 -08:00
whyiug	36e7670045	[Bugfix] Validate and concatenate image embeddings in MiniCPMVBaseModel (#11631 )	2024-12-30 18:51:04 +00:00
Robert Shaw	5886aa496e	[V1] [6/N] API Server: Better Shutdown (#11586 )	2024-12-30 15:51:02 +00:00
Cyrus Leung	8d9b6721e7	[VLM] Abstract out multi-modal data parsing in merged processor (#11620 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-30 15:01:35 +00:00
youkaichao	b12e87f942	[platforms] enable platform plugins (#11602 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-30 20:24:45 +08:00
Li, Jiang	5dbf854553	[CI/Build][CPU] Fix CPU CI by lazy importing triton FP8 kernels (#11618 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-12-30 10:17:04 +00:00
Tyler Michael Smith	970d6d0776	[Build][Kernel] Update CUTLASS to v3.6.0 (#11607 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-30 17:22:13 +08:00
Liangfu Chen	628ec6c17b	[Docker] bump up neuron sdk v2.21 (#11593 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com>	2024-12-30 13:46:14 +08:00
youkaichao	3682e33f9f	[v1] fix compilation cache (#11598 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-30 04:24:12 +00:00
Michael Goin	0aa38d16f5	Remove print statement in DeepseekScalingRotaryEmbedding (#11604 )	2024-12-29 20:16:46 +00:00

... 2 3 4 5 6 ...

4152 Commits