xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-17 17:23:34 +08:00

Author	SHA1	Message	Date
Kebe	51dd14ac2b	[Bugfix][DP] Fix creating too many DP Placement Groups (#26880 ) Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-23 20:16:51 +00:00
Matthew Bonanni	dbfbf9f324	[Attention] Fix FlashMLA metadata builder arguments for q_len > 1 (#27368 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-23 15:58:15 -04:00
Jonathan Chen	ca76486a16	[Chore] Separate out `vllm.utils.platform_utils.py` (#27374 ) Signed-off-by: Jonathan <chenleejonathan@gmail.com>	2025-10-23 19:08:06 +00:00
Ilya Markov	237cf6d32a	[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) (#26709 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-10-23 20:58:39 +08:00
Tova Movshovitz	88afa11010	[Metrics] [KVConnector] Add connector prefix cache hit rate stats (#26245 ) Signed-off-by: tovam <tovam@pliops.com>	2025-10-23 12:21:08 +02:00
wang.yuqi	3729ed00ba	[Model] Add num_cached_tokens for PoolingRequestOutput (#27378 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-10-23 14:03:42 +08:00
Giancarlo Delfin	6644796bf4	[V1][spec decode] return logprobs for spec decoding (#26060 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-10-22 22:59:59 -07:00
Andrew Sansom	ff93cc8c84	[CORE] Support Prefix Caching with Prompt Embeds (#27219 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-10-22 22:18:07 -07:00
PiteXChen	243ed7d32e	[Bugfix][Core] running queue index leakage exception (#26754 ) Signed-off-by: CLFutureX <chenyongqyl@163.com>	2025-10-22 21:40:12 -07:00
dongbo910220	a0003b56b0	[Chore] Separate out system utilities from vllm.utils (#27201 ) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-22 20:25:25 +00:00
Daisy-Ma-coder	5beacce2ea	[BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (#27128 ) Signed-off-by: qqma <qqma@amazon.com> Co-authored-by: qqma <qqma@amazon.com>	2025-10-22 19:36:39 +00:00
Sage	1651003c35	[Prefix Cache] Use LoRA name for consistent KV-cache block hashing (#27211 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2025-10-22 18:13:03 +00:00
Isotr0py	084a9dae80	[Bugfix] Disable FlexAttention direct block mask building for encoder-only models (#27344 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-22 16:39:08 +00:00
Nicolò Lucchesi	4dfdb821c8	[P/D] Dynamic `kv_output_aggregator` collect size (#26734 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-22 18:07:58 +02:00
dongbo910220	3ae082c373	[Chore] Separate out optional dependency checks from vllm.utils (#27207 ) Signed-off-by: dongbo910220 <1275604947@qq.com> Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-22 10:44:21 -04:00
Benjamin Chislett	19748806f0	[Bugfix] skip cuda graph for drafter when running with eager (#26821 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-21 15:39:09 -07:00
ExtReMLapin	4a8a567e16	Updated xgrammar backend to not deny supported string formats (#27253 ) Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr> Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com> Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-21 22:25:23 +00:00
Tao He	250fb1b8ea	[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. (#27144 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-21 18:27:03 +00:00
Nick Hill	647214f3d5	[V0 Deprecation] Remove V0 executors (#27142 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-21 11:09:37 -07:00
Eugene Khvedchenya	e93ff6c8b9	Nemotron Nano V2 VL + EVS Video Support (#27107 ) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Natan Bagrov <nbagrov@nvidia.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Natan Bagrov <nbagrov@nvidia.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-20 22:19:11 +08:00
Andy Lo	b63f2143f8	[LoRA] LoRA cuda graph specialization (#25914 ) Signed-off-by: Andy Lo <andy@mistral.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-20 04:21:09 +00:00
Sergei Skvortsov	f6fdacd82c	[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled (#26586 ) Signed-off-by: southfreebird <yvorott@gmail.com>	2025-10-19 19:24:46 +00:00
Cyrus Leung	d31f7844f8	[Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-19 05:20:55 -07:00
iAmir97	7a6c8c3fa1	[Chore] Separate out `vllm.utils.network_utils` (#27164 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>	2025-10-19 03:06:32 -07:00
dongbo910220	8a297115e2	[Chore] Separate out hashing utilities from vllm.utils (#27151 ) Signed-off-by: dongbo910220 <1275604947@qq.com>	2025-10-19 11:09:38 +08:00
22quinn	191eed0bb9	[BugFix] Fix lazy imports involving outlines_core (#27158 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-10-19 02:35:32 +00:00
Tova Movshovitz	83e760c57d	[V1][Metrics][Plugin] Add plugin support for custom `StatLoggerBase` implementations (#22456 ) Signed-off-by: tovam <tovam@pliops.com>	2025-10-18 15:12:46 -07:00
Nick Hill	3b45075206	[Minor] Add some clarifying comments to recent changes (#27130 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-18 09:52:45 -07:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Nicolò Lucchesi	b26b70bec4	[Misc] Refactor `get_kv_cache_spec` into `AttentionLayerBase` (#26587 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-18 13:51:21 +00:00
Fadi Arafeh	ab4be40fc5	[fix][cpu] fix prefill attention in CPU attention backend (#27035 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-10-18 13:30:21 +00:00
iAmir97	1d165d6d85	[Chore] Separate out `vllm.utils.mem_utils` (#27143 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-18 10:06:59 +00:00
Hanchenli	7c572544e4	[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot (#25515 ) Signed-off-by: Hanchenli <lihanc2002@gmail.com> Signed-off-by: Hanchenli <61769611+Hanchenli@users.noreply.github.com> Signed-off-by: Wei Wei <wwei6@meta.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wei Wei <wwei6@meta.com> Co-authored-by: Wei Wei <weiweinpu@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-17 21:55:54 -07:00
Pradyun92	acedc74b1a	[V1][Spec Decode] Fix greedy temperature detection after sampler refactor (#27077 ) Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com> Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>	2025-10-17 13:27:47 -07:00
Patrick von Platen	b038d9c40c	[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) (#26367 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-17 08:24:42 -07:00
Harry Mellor	6c9fdbf725	[Docs] Replace `rst` style double-backtick with `md` single-backtick (#27091 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:47:34 -07:00
Jee Jee Li	fec2b341ad	[Kernel] Lazy import FlashInfer (#26977 )	2025-10-17 04:48:18 +00:00
Nick Hill	fe3b9372ad	[Core] Change `execute_model_with_error_logging()` to be a ctx manager (#27060 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-17 11:45:32 +08:00
Lukas Geiger	4d055ef465	Remove unused imports (#26972 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-16 19:51:17 -07:00
Cyrus Leung	4d4d6bad19	[Chore] Separate out `vllm.utils.importlib` (#27022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 00:48:59 +00:00
Bram Wasti	b2f78cbad4	[small][batch invariance] Rename the env and internal flags to simplify usage (#26855 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-10-16 21:40:25 +00:00
rongfu.leng	5afd3276df	[Feature] Add process_weights_after_loading to AttentionImpl (#26870 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-10-16 08:02:30 -07:00
Cyrus Leung	d2740fafbf	[Chore] Separate out `vllm.utils.collections` (#26990 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 08:35:35 +00:00
Akash kaothalkar	f7d318de2b	[Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling (#26987 ) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>	2025-10-15 22:36:59 -07:00
Bram Wasti	7d8975de84	Deepseek-v3 Batch Invariant on 8xH100 (#26609 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-15 22:06:02 -07:00
Vadim Gimpelson	785d8b6410	[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) (#26437 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-10-16 12:18:31 +08:00
Cyrus Leung	f6cdc9a02f	[Chore] Rename `utils` submodules (#26920 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 03:58:13 +00:00
Angela Yi	e19b16dde6	[bugfix] Fix SP + PP without specifying compile size (#26955 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-10-15 20:05:33 -07:00
Adrian Abeyta	0a9ef0cfce	Move query quantization to attention layer for Flashinfer & Triton. (#26534 ) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Adrian Abeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 19:01:38 -04:00
Cyrus Leung	828523ad8e	[Chore] Separate out `vllm.utils.async_utils` (#26913 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-15 15:33:00 +00:00

1 2 3 4 5 ...

1493 Commits