xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-30 09:27:14 +08:00

Author	SHA1	Message	Date
Thomas Parnell	e642ec962c	Add authors to license header. (#14371 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>	2025-03-06 08:43:09 -08:00
Dilip Gowda Bhagavan	ada19210a3	Adding cpu inference with VXE ISA for s390x architecture (#12613 ) Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com> Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com> Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com>	2025-03-06 08:40:53 -08:00
Harry Mellor	bf0560bda9	Reinstate `best_of` for V0 (#14356 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-06 08:34:22 -08:00
youkaichao	151b08e0fe	[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-07 00:32:46 +08:00
Jitse Klomp	81b2f4a45f	[Doc] Fix date typo in README.md (#14366 ) Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl>	2025-03-06 08:29:57 -08:00
Cyrus Leung	82551ad616	[Core] Don't use cache during multi-modal profiling (#14336 )	2025-03-06 08:03:31 -08:00
courage17340	caac5c2e59	[Bugfix][Core] fix abort_seq_group and memory leak when n>1 (#14326 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-03-06 23:59:32 +08:00
Thomas Parnell	6bd1dd9d26	[Kernel] [V1] Improved performance for V1 Triton (ROCm) backend (#14152 )	2025-03-06 07:39:16 -08:00
Irina Yuryeva	4f27044aab	[Doc] Correct beam_search using in generative_models.md (#14363 )	2025-03-06 15:37:10 +00:00
Yanyi Liu	0ddc991f5c	[Doc] Update reasoning with stream example to use OpenAI library (#14077 ) Signed-off-by: liuyanyi <wolfsonliu@163.com>	2025-03-06 13:20:37 +00:00
Nicolò Lucchesi	fa82b93853	[Frontend][Docs] Transcription API streaming (#13301 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-06 10:39:35 +00:00
Nicolò Lucchesi	69ff99fdcd	[Core] Optimizing cross-attention `QKVParallelLinear` computation (#12325 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal> Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>	2025-03-06 09:37:26 +00:00
lkchen	5d802522a7	[V1][VLM][Pixtral-HF] Support Pixtral-HF on V1 (#14275 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-03-06 08:58:41 +00:00
kYLe	1769928079	[Model] Update Paligemma multimodal processing with PromptUpdate (#14015 ) Signed-off-by: Kyle Huang <kylhuang@nvidia.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-03-06 08:31:38 +00:00
Pavani Majety	ed6ea06577	[Hardware] Update the flash attn tag to support Blackwell (#14244 )	2025-03-05 22:01:37 -08:00
Nicolò Lucchesi	5ee10e990d	[Bugfix][CI] ALiBi test case in xformers multi_query_kv_attention (#11301 )	2025-03-05 20:00:53 -08:00
Varun Sundar Rabindranath	3dbd2d813a	[V1] LoRA - Enable more V1 tests (#14315 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-06 11:55:42 +08:00
Ce Gao	f5f7f00cd9	[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 (#14114 )	2025-03-06 03:49:20 +00:00
Rui Qiao	abcc61e0af	[misc] Mention `ray list nodes` command to troubleshoot ray issues (#14318 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-03-06 02:00:36 +00:00
Lucas Wilkinson	f6bb18fd9a	[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-05 17:10:13 -08:00
Yuan Tang	71eaf8969b	[Build] Add UV_HTTP_TIMEOUT to avoid timeout during installation (#13850 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-05 17:09:29 -08:00
Michael Goin	ca100c90fe	Add benchmark for DeepGEMM and vLLM Block FP8 Dense GEMM (#13917 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-05 17:08:51 -08:00
Russell Bryant	ffad94397d	[CI/Build] Use spawn multiprocessing mode for V1 test pipeline (#14243 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-05 17:08:02 -08:00
Lucas Wilkinson	4dacaa4a83	[BugFix] Fix prefix caching V0 MLA (#14255 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Ying Zhong <zhongyingmatrix@gmail.com>	2025-03-05 17:07:42 -08:00
Tyler Michael Smith	a7ea35aa67	[Bugfix] Remove num_tokens_across_dp (#14302 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-05 23:55:55 +00:00
pyc96	1e3e76b6cc	[Bugfix] Fix DeepSeek MTP crash when using TP1ModelRunner with CUDA graph due to shape mismatch (#14237 ) Signed-off-by: pyc96 <pychen96@gmail.com>	2025-03-05 22:22:40 +00:00
Lu Fang	53ea6ad830	[V1][Easy] Add empty allowed_token_ids in the v1 sampler test (#14308 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-05 21:41:18 +00:00
Serena	1b7624bf5c	[misc] Add FlashMLA as a new option of VLLM_ATTENTION_BACKEND env (#14267 )	2025-03-05 21:28:50 +00:00
Nick Hill	ac60dc7fe1	[V1][BugFix] Fix for mixed top_k batch (#14301 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Ye Cao <caoye.cao@alibaba-inc.com>	2025-03-05 20:43:04 +00:00
Vincent	a4f1ee35d6	Deprecate `best_of` Sampling Parameter in anticipation for vLLM V1 (#13997 ) Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-05 20:22:43 +00:00
Nick Hill	a32c8669ca	[V1][Minor] Remove obsolete FIXME comment (#14304 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-05 11:59:23 -08:00
Simon Mo	ca2ca8de57	[Docs] Add Meta Slides (#14297 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-03-05 08:30:23 -08:00
Isotr0py	f71b00a19e	[Bugfix] Fix broken vision language example (#14292 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-05 15:57:10 +00:00
DaividFrank	8f808cf86e	prefix_caching.md: Fixed typo (#14293 ) Signed-off-by: Daivid Savernin-Frenk <daivid.frank@TurboNext.ai>	2025-03-05 15:43:13 +00:00
Jee Jee Li	7bab4bb048	[Misc] Add Qwen2MoeForCausalLM moe tuning support (#14276 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-05 23:11:29 +08:00
Isotr0py	e17e4488bd	[LoRA] Remove linear hack outside transformers backend (#14177 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-05 15:06:28 +00:00
Robert Shaw	257e200a25	[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-03-05 14:18:55 +00:00
Zhe Zhang	47d4a7e004	Small update for external_launcher backend docs (#14288 )	2025-03-05 21:30:00 +08:00
Cyrus Leung	7f89a594dd	[Doc] [3/N] Refer code examples for common cases in dev multimodal processor (#14278 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-05 12:29:50 +00:00
Iacopo Poli	961644e6a8	[Doc] Update nginx guide: remove privileged from vllm container run and add target GPU ID (#14217 ) Signed-off-by: Iacopo Poli <iacopo@lighton.ai>	2025-03-05 11:44:10 +00:00
Lu Fang	8d6cd32b7b	[Bugfix][V1] Fix allowed_token_ids for v1 Sampler (#14169 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-05 08:49:44 +00:00
Roger Wang	ec79b67c77	[Misc][V1] Avoid using `envs.VLLM_USE_V1` in mm processing (#14256 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-05 07:37:16 +00:00
Benjamin Chislett	32985bed7c	[Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-03-05 06:30:40 +00:00
Michael Goin	dae9ec464c	Temporarily disable test_awq_gemm_opcheck (#14251 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-05 06:10:35 +00:00
youkaichao	6eaf93020d	[platforms] improve rocm debugging info (#14257 )	2025-03-04 21:32:18 -08:00
Tyler Michael Smith	72c62eae5f	[V1] EP/TP MoE + DP Attention (#13931 )	2025-03-04 21:27:26 -08:00
Congcong Chen	0a995d5434	[Model] New model support for Phi-4-multimodal-instruct (#14119 )	2025-03-04 20:57:01 -08:00
Cody Yu	ade3f7d988	[V1][Bugfix] Do not reset prefix caching metrics (#14235 )	2025-03-05 04:39:13 +00:00
rainkert	0df25101d6	[Bugfix] Fix gptq_marlin for deepseek-v3 (#13750 ) Signed-off-by: dangshunya <dangshunya@baichuan-inc.com> Co-authored-by: dangshunya <dangshunya@baichuan-inc.com>	2025-03-05 12:25:53 +08:00
Michael Goin	e123aafdf0	Disable GPTQ AllSpark kernels for CUDA Compiler < 12.0 (#14157 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-05 12:25:24 +08:00

... 20 21 22 23 24 ...

6026 Commits