xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-13 16:27:04 +08:00

Author	SHA1	Message	Date
Huamin Li	07a606aa7e	[CI Failure] Fix backend selection for encoder-only models (#28534 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-13 10:11:27 -05:00
Pleaplusone	8da2f28f53	[ROCm][BugFix]Fix `get_cu_count` in rocm_aiter_fa.py (#28618 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-13 14:18:20 +00:00
tjandy98	4504e8029b	[Bugfix] Prevent crash on empty grammar string (#28210 ) Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com>	2025-11-13 06:42:29 +00:00
Pleaplusone	ca00b1bfc6	[ROCm][BugFix] Remove the usage of `device_info` from aiter (#28383 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-12 21:43:42 -08:00
Jialin Ouyang	a1d3866dda	[n-gen] DO NOT repeatedly return finished child requests (#28591 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-13 03:36:07 +00:00
Harry Mellor	97d1c99302	Rename clashing method names for vLLM model protocol (#27583 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:14:33 -08:00
Michael Goin	a543e678b4	[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support (#28561 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-12 19:40:59 -07:00
Wei Wei	478ee511de	[Misc]Fix typo in llm_engine.py (#28584 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-11-12 12:59:43 -08:00
Andy Lo	58ce8d12b7	[BugFix] Priority scheduling and spec tokens preemption (#28558 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-12 20:29:21 +00:00
alberto	bac904565f	Implement ARC KV cache eviction policy for CPU offloader (#27039 ) Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: alberto <aperdomo@redhat.com> Co-authored-by: Or Ozeri <or@ozery.com>	2025-11-12 09:51:39 -08:00
Benjamin Chislett	304419576a	[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer (#28479 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-13 01:56:40 +09:00
Harry Mellor	a742134cc5	Remove deprecated fields from `CompilationConfig` (#27593 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 16:10:28 +00:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Andreas Karatzas	9f0247cfa4	`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>	2025-11-11 18:34:36 -08:00
Li, Jiang	7f829be7d3	[CPU] Refactor CPU attention backend (#27954 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-12 09:43:06 +08:00
Isotr0py	3f770f4427	[Performance] Cache loaded custom logitsprocs to avoid overheads (#28462 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-11 16:49:29 -08:00
Max Hu	412e153df5	[Feature] Allow configuring FlashInfer workspace size (#28269 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-11 23:32:20 +00:00
wangxiyuan	d4902ba56d	[Misc] Cleanup Executor interface (#28441 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 22:28:07 +00:00
Jie Luo	8c32c6e4b4	[Misc] fix typo in DCP comment (#28389 ) Signed-off-by: Livinfly <luojie3m@gmail.com>	2025-11-11 10:59:16 -08:00
Jialin Ouyang	4228be7959	[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (#28245 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-11 10:28:47 -08:00
Cyrus Leung	afffd3cc8a	[Model] Pass `mm_features` directly into `get_mrope_input_positions` (#28399 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-11 21:14:48 +08:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Adrian Abeyta	a5a790eea6	[Bugfix] Ensure calculated KV scales are applied in attention. (#27232 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-11-10 23:42:37 +00:00
Jialin Ouyang	b30372cbd0	[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-10 15:34:18 -08:00
Wei Wei	bf6a3d0ff5	[Misc] Add more scoping for improved trace (#28329 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-11-10 21:03:21 +00:00
Rémi Delacourt	6d54336ae5	[Bugfix] Fix llguidance backend, rollback when EOS was encountered (#25905 ) Signed-off-by: Rémi Delacourt <remi@mistral.ai> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-11-10 14:53:32 -05:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Mark McLoughlin	6f7de33bed	[Metrics] Refactor LoRA state tracking (#26801 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 16:34:36 +08:00
Lucas Wilkinson	e8697faf03	[V0 deprecation] Remove no longer used `get_metadata_cls` (#28370 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-10 14:32:09 +08:00
usberkeley	4a8d6bd168	Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method (#28214 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-11-09 19:11:46 +00:00
Lucas Wilkinson	636efd10a5	[Core] Separate out attention metadata building logic from prepare inputs (#26764 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-09 13:51:43 -05:00
Nick Hill	289eb6c537	[Core] Simplify async KV output aggregation (#28327 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-09 09:44:13 -08:00
Ning Xie	e5e9067e61	[Misc] fix typo and add detailed log (#28178 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-09 05:33:46 +00:00
Benjamin Chislett	975676d174	[Feat] Drop-in Torch CUDA Profiler (#27841 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-08 14:07:37 -08:00
zhangsicheng5	2108a571d7	[DCP] Support dcp kv_cache interleave size > 1 (#26696 ) Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: Qiu <qiuchunshuo@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-09 04:45:27 +09:00
Andy Lo	47604137a2	[Bugfix] Spec decode + structured output + spec model max len edge case (#28298 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-08 19:44:25 +00:00
22quinn	608bb14462	[Attention] Remove max cudagraph size limit of 992 (#27840 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-07 22:33:27 -08:00
gnovack	70af44fd10	[bugfix] support eagle with lora cudagraph specialization (#28318 ) Signed-off-by: gnovack <gnovack@amazon.com>	2025-11-08 03:25:45 +00:00
Xiaohong (Sean) Chen	d0c7792004	[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068 ) Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Haipeng Li <li2haipeng@gmail.com> Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>	2025-11-08 01:58:22 +00:00
Nick Hill	67a2da890e	[PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 22:11:03 +00:00
Nick Hill	da786e339e	[Core] Rework handling of async scheduling config (#28250 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 20:01:23 +00:00
Nicolò Lucchesi	68a72a5cc1	Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012 )" (#28289 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-07 15:07:01 +00:00
Lukas Geiger	e0919f331d	[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (#28168 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-07 12:14:29 +00:00
Zhang Xiangze	7bdb42b2f2	[CPU]Avoid repeated random sample compile (#28260 ) Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>	2025-11-07 11:03:57 +00:00
Jialin Ouyang	ccd98b59c1	[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead (#28171 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-07 00:27:12 -08:00
StanHatko	e52e4da971	[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores (#27953 ) Signed-off-by: Stan Hatko <stan_hatko@live.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-11-06 23:47:11 +08:00
Aditya Tewari	3755c14532	[CPU] Enable torch profiling (#28130 ) Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>	2025-11-06 07:32:05 +00:00
Dayeol Lee	1767658559	[Debugging] Add annotation for easier trace analysis (#22496 )	2025-11-05 16:52:52 -08:00
Kuntai Du	efe73e9b57	[Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks` by `get_last_useful_token` (#25431 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-11-06 00:12:00 +00:00
Snehlata	e15601789b	[Feature]: Add corrupted request metric to V1 metrics system. (#27306 ) Signed-off-by: atalhens <sneh.lata@nutanix.com>	2025-11-05 13:45:29 -08:00

1 2 3 4 5 ...

1613 Commits