xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 01:47:16 +08:00

Author	SHA1	Message	Date
Xu Wenqing	ec89524f50	Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (#19205 )	2025-06-05 16:38:54 +00:00
Patrick von Platen	f20f9f063b	[mistral_common] Add v11 tokenizer (#19193 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-06-05 08:27:41 -07:00
Guillaume Calmettes	9bc8bb07cf	[Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided (#19202 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-06-05 12:59:28 +00:00
Reid	1aeb925f34	[Frontend] improve vllm run-batch --help display (#19187 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-05 11:16:25 +00:00
22quinn	188a4590d8	[Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly (#19105 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-05 11:14:32 +00:00
vllmellm	18093084be	[Misc] Remove unnecessary fallback to prefill-decode attention (#19138 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-06-05 16:08:26 +08:00
Simon Mo	da40380214	[Build] Annotate wheel and container path for release workflow (#19162 ) Signed-off-by: simon-mo <simon.mo@hey.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-04 23:24:56 -07:00
Chauncey	8fc57501d3	[Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled (#19135 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-05 06:24:24 +00:00
Woosuk Kwon	af7fc84fd2	[BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 (#19171 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-05 13:41:25 +08:00
Huy Do	0678b52251	Handle non-serializable objects when dumping benchmark results (#19114 )	2025-06-04 22:40:04 -07:00
Yang Wang	25b918eee6	[Torch Nightly]add missing dependency (#18770 ) Signed-off-by: Yang Wang <elainewy@meta.com>	2025-06-04 21:56:12 -07:00
Michael Goin	a408820f2f	[Bugfix] Fix port handling in make_zmq_path (#19117 )	2025-06-04 21:00:59 -06:00
Robert Shaw	c56ed8bb0e	[Bugfix][Nixl] Fix full prefix cache hit bug (#18632 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-05 02:07:32 +00:00
Reid	78dcf56cb3	[doc] small fix (#19167 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-05 09:13:50 +08:00
Nicolò Lucchesi	b2fac67130	[P/D] Heterogeneous TP (#18833 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-06-04 23:25:34 +00:00
CYJiang	23027e2daf	[Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM (#18817 ) Signed-off-by: googs1025 <googs1025@gmail.com>	2025-06-04 15:37:25 -07:00
Varun Sundar Rabindranath	c3fd4d669a	[Kernel] Integrate batched/masked deepgemm kernel (#19111 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-04 21:59:18 +00:00
Kebe	ef3f98b59f	[Bugfix] fix v1 cpu worker fails on macOS (#19121 )	2025-06-04 20:17:38 +00:00
Siyuan Liu	7ee2590478	[TPU] Update dynamo dump file name in compilation test (#19108 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 16:13:43 -04:00
Michael Goin	53a5a0ce30	[Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-04 10:46:28 -07:00
Tyler Michael Smith	d459fae0a2	[Bugfix][EP+DP] Fix internode check (#19112 ) Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>	2025-06-04 23:39:23 +08:00
jmswen	c8dcc15921	Allow AsyncLLMEngine.generate to target a specific DP rank (#19102 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-04 08:26:47 -07:00
Cyrus Leung	8f4ffbd373	[Doc] Update V1 Guide for embedding models (#19141 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-04 22:57:55 +08:00
Lain	5f2cd251d2	Sm100 blockwise fp8 swap ab (#18564 )	2025-06-04 07:48:45 -07:00
Xu Wenqing	02658c2dfe	Add DeepSeek-R1-0528 function call chat template (#18874 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>	2025-06-04 13:24:18 +00:00
Cyrus Leung	01dc9a76db	[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-04 04:49:20 -07:00
wang.yuqi	35cf32df30	Improve the output precision of embedding models (#19092 )	2025-06-04 11:48:57 +00:00
Isotr0py	8711bc5e68	[Misc] Add packages for benchmark as extra dependency (#19089 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-04 04:18:48 -07:00
Seiji Eicher	2669a0d7b5	Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-06-04 17:10:45 +08:00
Siyuan Liu	8e972d9c44	[TPU] Skip hanging tests (#19115 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 01:43:00 -07:00
汪志鹏	3336c8cfbe	Fix #19130 (#19132 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-04 01:42:06 -07:00
Woosuk Kwon	b124e1085b	[Bugfix] Fix FA3 full cuda graph correctness (#19106 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-03 23:10:15 -07:00
Kaixi Hou	41aa578428	[NVIDIA] Add Cutlass MLA backend (#17625 )	2025-06-03 21:40:26 -07:00
Calvin Chen	8d646c2e53	[Cleanup][v1]:remote guided-decoding-backend for example (#19059 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-06-04 04:23:26 +00:00
Vadim Gimpelson	5d6d1adf15	[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437 )	2025-06-03 21:13:01 -07:00
Lukas Geiger	1409ef9134	[Core] Cast multimodal input in hf processor (#18862 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-06-03 20:24:56 -07:00
Li, Jiang	4555143ea7	[CPU] V1 support for the CPU backend (#16441 )	2025-06-03 18:43:01 -07:00
Russell Bryant	52dceb172d	[Docs] Add developer doc about CI failures (#18782 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-06-04 01:09:13 +00:00
Jiaxin Shan	abd7df2fca	[Misc] Fix path and python alias errors in disagg_prefill exmaples (#18919 )	2025-06-03 17:15:18 -07:00
Yan Ru Pei	b712be98c7	feat: add data parallel rank to KVEventBatch (#18925 )	2025-06-03 17:14:20 -07:00
Chen Zhang	a8da78eac9	[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-04 00:14:06 +00:00
Nicolò Lucchesi	5d96533e22	[Bugfix][P/D] Fix Prefix Cache Bug (#18411 ) Signed-off-by: nicklucche <nlucches@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-06-03 23:53:16 +00:00
Chauncey	4de790fcad	[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (#19075 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-03 23:27:24 +00:00
Chen Zhang	b5fd9506c1	[Bugfix] get_num_blocks_to_allocate with null_block (#19031 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 15:30:55 -07:00
Ekagra Ranjan	135cf55cd1	[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix (#18971 )	2025-06-03 15:26:33 -07:00
Chen Zhang	6cac54f4d1	[v1] Re-init input batch for multiple kv cache groups (#18654 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 21:41:36 +00:00
Harry Mellor	6865fe0074	Fix interaction between `Optional` and `Annotated` in CLI typing (#19093 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yikun Jiang <yikun@apache.org>	2025-06-03 21:07:19 +00:00
Michael Goin	e31446b6c8	[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-03 13:48:25 -07:00
Yong Hoon Shin	bdf13965ab	[V1] Support cross-layer KV sharing (#18212 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-06-03 20:33:07 +00:00
Varun Sundar Rabindranath	fa98d77773	[Kernel] DeepEP dispatch-combine kernel integration (#18434 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-03 12:30:02 -07:00

1 2 3 4 5 ...

6967 Commits