xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-08 00:57:17 +08:00

Author	SHA1	Message	Date
clark	6e1fba8a73	1. connect_parser set --prefill-addr and --decode-addr are required 2.To more accurately reflect its purpose, we will rename connect.py to disagg_connector.py. Signed-off-by: clark <panf2333@gmail.com>	2025-03-21 08:17:44 +08:00
clark	bfde1688e7	add /v1/completions stream support Signed-off-by: clark <panf2333@gmail.com>	2025-03-21 08:17:44 +08:00
clark	905424ed65	add identity url headers Signed-off-by: clark <panf2333@gmail.com>	2025-03-21 08:15:42 +08:00
clark	5d20f389d6	add vllm connect cmd Signed-off-by: clark <panf2333@gmail.com>	2025-03-21 08:15:42 +08:00
Woosuk Kwon	2b22290ce0	[V1] Add flag to disable cascade attention (#15243 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-20 15:24:16 -07:00
Jason	d8e82bc06d	[Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043 ) Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>	2025-03-20 10:01:02 -07:00
Richard Liu	a8f12a63fd	Fix env vars for running Ray distributed backend on GKE (#15166 ) Signed-off-by: Richard Liu <ricliu@google.com>	2025-03-20 14:59:33 +00:00
Cyrus Leung	27261e40a6	[Bugfix] Multi-video inference on LLaVA-Onevision (#15082 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-03-20 14:10:45 +00:00
Wang Ran (汪然)	c607a2652b	Fixing Imprecise Type Annotations (#15192 )	2025-03-20 01:19:55 -07:00
billishyahao	742369d35a	[Frontend][Bugfix] support prefill decode disaggregation on deepseek (#14824 ) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Zhai Feiyue <80079571+ZhaiFeiyue@users.noreply.github.com>	2025-03-20 00:00:33 -07:00
Wang Ran (汪然)	bfe2fe0af4	typo: Update config.py (#15189 )	2025-03-19 23:31:21 -07:00
Matt Ritter	a8652f4f0f	Enable CUDA graph support for llama 3.2 vision (#14917 ) Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com>	2025-03-19 23:29:16 -07:00
Mickaël Seznec	a597a57595	[Attention] Flash Attention 3 - fp8 (#14570 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai>	2025-03-20 01:14:20 -04:00
Chauncey	ae65f3e237	[Misc]fixed disable these http request logs (#14754 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-03-19 21:53:40 -07:00
Russell Bryant	1f16b7fe74	[Core][V0] Add guidance backend for structured output (#14589 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <lohuynh@microsoft.com> Co-authored-by: Michal Moskal <michal@moskal.me> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-19 21:33:51 -07:00
Nicolò Lucchesi	d8c6d7d6b5	[V1][TPU] Support V1 Sampler for ragged attention (#14227 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-19 21:00:39 -07:00
Cyrus Leung	ffa443afed	[Bugfix] Fix embedding assignment for InternVL-based models (#15086 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-20 03:40:13 +00:00
Nick Hill	c47aafa37c	[BugFix] Lazily import XgrammarBackend to avoid early cuda init (#15171 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-20 01:30:43 +00:00
Alexander Matveev	cfbca8a2f2	[V1] TPU - Tensor parallel MP support (#15059 )	2025-03-20 00:55:18 +00:00
Nick Hill	22d33baca2	[FrontEnd][Perf] `merge_async_iterators` fast-path for single-prompt requests (#15150 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-19 21:04:41 +00:00
iefgnoix	b0e96aaebb	[V1][TPU] Change kv cache shape. (#15145 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-03-19 12:16:42 -07:00
Wang Ran (汪然)	8310e0b59b	simple bugfix: Update stats.py (#15139 )	2025-03-19 18:26:27 +00:00
maobaolong	26dd972adb	[FEAT]Support reset prefix cache by specified device (#15003 )	2025-03-19 10:54:41 -07:00
Alessandro Sangiorgi	374ee287d8	[Frontend] Remove custom_cache_manager (#13791 ) Signed-off-by: fulvius31 <asangior@redhat.com>	2025-03-20 00:13:50 +08:00
Jan Kaniecki	8363cd093d	[Bugfix] Adjust mllama to regional compilation (#15112 ) Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>	2025-03-19 07:57:25 -07:00
Cyrus Leung	3d446433ec	[Bugfix] Fix size calculation of processing cache (#15114 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 05:53:19 -07:00
Cyrus Leung	1fe0fd12d3	[Misc] Avoid unnecessary HF `do_rescale` warning when passing dummy data (#15107 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 03:42:31 -07:00
Roger Wang	dafb4e504a	[V1][Bugfix] Fix oracle for device checking (#15104 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-19 18:35:32 +08:00
Cyrus Leung	61f412187d	[Bugfix] Re-enable Gemma3 for V1 (#14980 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-18 23:58:22 -07:00
Woosuk Kwon	05ccd0aa35	[V1] Ensure using int64 for sampled token ids (#15065 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-18 23:52:19 -07:00
Cyrus Leung	f690372b68	[Core] Update dtype detection and defaults (#14858 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 13:49:33 +08:00
Brayden Zhong	8b3e94a357	[Model] Remove duplicated message check in Mistral chat completion request (#15069 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-19 05:09:32 +00:00
Julien Denize	437f9162d0	[Model] Pixtral: Remove layer instantiation duplication (#15053 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-03-19 10:34:03 +08:00
Cody Yu	4f065f12f5	[Misc][V1] Skip device checking if not available (#15061 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-18 19:33:43 -07:00
Chujie Zheng	027827cc1d	fix long dtype in topk sampling (#15049 )	2025-03-18 15:57:31 -07:00
Woosuk Kwon	99abb8b650	[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-18 14:31:54 -07:00
Russell Bryant	3a1e648158	[V1] Refactor Structured Output for multiple backends (#14694 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-18 19:49:15 +00:00
Jee Jee Li	46c759c165	[Bugfix] Fix LoRA extra vocab size (#15047 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-18 09:40:29 -07:00
Isotr0py	179a619c21	[Bugfix] Fix broken CPU quantization due to triton import (#15038 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-18 08:57:39 -07:00
yury-tokpanov	452e8fd968	[MODEL] Add support for Zamba2 models (#13185 ) Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-18 08:56:21 -07:00
ekuznetsov139	8b793f7ec6	MI325 configs, fused_moe_kernel bugfix (#14987 ) Signed-off-by: Eugene Kuznetsov <eugene.kuznetsov@amd.com>	2025-03-18 08:05:18 -07:00
Nicolò Lucchesi	af35d3a3cc	[TPU][V1][Bugfix] Fix chunked prefill with padding (#15037 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-18 07:34:45 -07:00
Simon Mo	3b457143d2	[Bugfix] Register serializers for V0 MQ Engine (#15009 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-03-18 09:14:47 -04:00
Cyrus Leung	ab656f2c2f	[Bugfix] Loosen type check to avoid errors in V1 (#15021 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-18 12:54:40 +00:00
Sebastian Schoennenbeck	dd732028f5	[Bugfix][Frontend] Fix validation of `logprobs` in `ChatCompletionRequest` (#14352 ) Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>	2025-03-18 05:50:05 -07:00
hoshi-hiyouga	414919138b	[Bugfix] torchrun compatibility (#14899 ) Signed-off-by: hiyouga <hiyouga@buaa.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-03-18 05:49:27 -07:00
Jee Jee Li	db7c8ca910	[Misc] Embedding model support LoRA (#14935 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-18 12:07:00 +00:00
Varun Sundar Rabindranath	400d483e87	[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-18 09:47:53 +00:00
Tristan Leclercq	5eeabc2a44	[Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights (#14950 )	2025-03-17 23:27:26 +00:00
Robert Shaw	e41e160263	[V1] Guard Against Main Thread Usage (#14972 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-17 13:23:02 -07:00

1 2 3 4 5 ...

3583 Commits