xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-10 12:37:15 +08:00

Author	SHA1	Message	Date
Benjamin Chislett	85aff45e24	[Perf] Remove blocking copy in GDN Attention (#31167 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-12-22 14:25:22 -08:00
Wentao Ye	5312a7284e	[Bug] Fix `'CutlassMLAImpl' object has no attribute '_workspace_buffer'` (#31173 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-22 14:24:27 -08:00
Lucas Wilkinson	de71747655	[SpecDecode] Simplified alternative padded-speculation acceptance rate fix (#29845 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-22 13:06:10 -08:00
Pavani Majety	b10f41c894	[SM100] Enable fp8 compute for prefill MLA (#30746 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-12-22 19:15:57 +00:00
Boyuan Feng	8dd0db687b	[UX] improve profiler error message (#31125 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-12-22 08:45:59 -08:00
dengyunyang	8f8f469b1b	[BugFix] skip language model in Encoder (#30242 ) Signed-off-by: dengyunyang <584797741@qq.com>	2025-12-22 05:25:59 -08:00
Jeffrey Wang	1501a4070e	[Bugfix] Read truncate_prompt_tokens from pooling_params in AsyncLLM.encode() (#31013 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2025-12-20 10:29:31 +00:00
Lucas Wilkinson	5f6477d1d0	[BugFix] Fix TypeError: unhashable type: 'dict' when serving deepseek32 (#30924 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-19 16:07:54 -05:00
Seiji Eicher	1ab5213531	Make engine core client handshake timeout configurable (#27444 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-12-19 20:38:30 +00:00
Nick Hill	2ac85a4544	[BugFix] Fix logprobs with spec decode and modified logits (#30846 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-18 19:58:28 -08:00
Nick Hill	45c0526ac9	[BugFix] Handle errors when preprocessing added requests (#30895 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-19 01:29:11 +00:00
Benjamin Chislett	d6b3d39b6d	[Cleanup] Refactor FlashInferMetadataBuilder (#29128 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-18 14:45:30 -08:00
Nick Hill	b0b77c4655	[BugFix] Fix spec decode + structured outputs + preemption edge case (#30916 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-18 12:59:55 -08:00
Chen Zhang	24b65eff0d	[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 (#30319 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-12-18 19:47:56 +00:00
Alec	62be3670cb	[BugFix] Add sleep to fix tight loop and release GIL (#29476 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-18 09:52:55 -08:00
Nick Hill	686cbaac64	[Cleanup] Remove unused ModelRunner V1 `InputBatch.num_tokens` field (#30218 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-18 09:17:00 -08:00
Andreas Karatzas	be2ad5f920	[ROCm][Bugfix] fix(structured_output): Skip guidance backend for schemas with patternProperties (#30730 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-18 07:04:57 +00:00
Yifan Qiao	11a89cf95c	[Fix][FlexAttention] return max logical block index to handle reused blocks (#30915 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-18 06:42:21 +00:00
Micah Williamson	fd8afdf38d	[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-18 10:27:37 +08:00
SungMinCho	a0b782f9cc	[Metrics] Model FLOPs Utilization estimation (#30738 ) Signed-off-by: SungMinCho <tjdals4565@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-12-18 01:40:51 +00:00
Isotr0py	74a1ac38b0	[v1] Add PrefixLM support to TritonAttention backend (#30386 )	2025-12-17 16:05:24 -08:00
Matthew Bonanni	7eb6cb6c18	[Attention] Update tests to remove deprecated env vars (#30563 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-17 09:49:59 -08:00
Cyrus Leung	2497228ad4	[Chore] Factor out logic for requesting initial memory (#30868 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-17 07:32:17 -08:00
Jialin Ouyang	6e9dbcc50e	[Fix] uniform decode batch check (#30747 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-12-17 19:58:43 +08:00
Harry Mellor	fb980eb2fd	Fix lazy import (#30858 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-17 03:33:50 -08:00
Roger Wang	f5f51e5931	[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Sun Kim <sunytokki@gmail.com>	2025-12-16 14:18:17 -08:00
Lucas Wilkinson	9fec0e13d5	[Attention] Cache attention metadata builds across hybrid KV-cache groups (#29627 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>	2025-12-16 17:10:16 -05:00
Harry Mellor	e1625498f4	Update where `bytes_to_unicode` is imported from (#30771 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-16 08:05:01 -08:00
Lucas Wilkinson	00a8d7628c	[BugFix] Fix memory spike in workspace allocation (#30744 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-16 06:46:22 -08:00
Nicolò Lucchesi	75eb302a2e	[Bugfix] Whisper fix number of allocated CrossAttn blocks per-request (#30772 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-16 14:20:19 +00:00
Pleaplusone	9dbbc59b15	[ROCm][MTP] Support MTP for AITER MLA backend (#28624 ) Signed-off-by: ganyi <ygan@amd.com>	2025-12-16 14:10:26 +00:00
Jee Jee Li	0e391e7570	[Bugfix] Fix RequestOutput miss lora_request (#30636 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-16 01:36:35 -08:00
jiangkuaixue123	b9ff4f2a8d	[feature] extend DBO to XBO (#30120 ) Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com> Co-authored-by: root <root@hk01dgx028.cm.cluster>	2025-12-16 00:04:01 -05:00
Matthew Bonanni	60dbf7d8f1	Update batch invariant to use attention config (#30704 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-15 15:24:16 -05:00
Jee Jee Li	a524d1ba0a	[Bugfix] Fix deepseek_v32 tokenizer_mode (#30658 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-15 04:20:31 +00:00
Or Ozeri	174e39ead7	CPU KV Offloading: Use more CUDA streams (#29013 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-14 23:50:45 +00:00
Johannes F	060893654d	fix: Update json features supported by xGrammar (#30390 ) Signed-off-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com> Signed-off-by: Johannes F <johannesflommersfeld@users.noreply.github.com> Co-authored-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-14 02:16:06 -08:00
drslark	add1b9d3de	[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#30632 ) Signed-off-by: drslark <slarksblood@qq.com>	2025-12-14 01:32:16 -08:00
Wentao Ye	6e78ed6ba7	[Logs] Optimize startup logs 4 (#29903 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-13 16:12:53 -05:00
Isotr0py	7c16f3fbcc	[Doc] Add documents for multi-node distributed serving with MP backend (#30509 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-13 18:02:29 +00:00
Cyrus Leung	39cefbdf17	[Refactor] `TokenizerRegistry` only uses lazy imports (#30609 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-13 23:16:22 +08:00
Cyrus Leung	64251f48df	[Chore] Adjust tokenizer import to avoid circular imports (#30601 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-13 04:42:39 -08:00
Nick Hill	1cec5b7ea9	[Scheduer] Simplify stop checking for pooling models (#30591 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-13 09:45:26 +00:00
Cyrus Leung	b09806e28f	[Bugfix] Dictionary MM embeddings for online chat (#30507 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-13 15:48:56 +08:00
Roberto L. Castro	4fa7ce46f3	[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 ) Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-12 19:34:23 -08:00
Wentao Ye	02a5880394	[CI] Fix mypy for vllm/v1/executor (#30517 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-12 18:05:34 +00:00
realliujiaxu	d2c919dcc2	[bugfix] fix bug when top_logprobs=0 with spec decoding (#30059 ) Signed-off-by: realliujiaxu <realliujiaxu@163.com>	2025-12-12 09:03:35 -08:00
jvlunteren	9c0ee995a8	[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel (#28306 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com> Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-12-12 16:55:40 +01:00
Lucas Wilkinson	3e41992fec	[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-12 05:57:47 -08:00
Lucas Wilkinson	042da73244	[Core] Refactor `_build_attention_metadata` (#29628 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-11 17:54:12 -08:00

1 2 3 4 5 ...

1893 Commits