xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-07 08:17:11 +08:00

Author	SHA1	Message	Date
Benjamin Chislett	e858bfe051	[Cleanup] Refactor profiling env vars into a CLI config (#29912 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-09 13:29:33 -05:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Nick Hill	4026ae31e9	[Misc] Move `disable_nccl_for_dp_synchronization` init logic into `VllmConfig` (#30161 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 20:59:04 -08:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Max Hu	c2894d3883	[Feature] Add Layer-wise NVTX Support (#29990 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Signed-off-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2025-12-05 11:20:07 +00:00
Yong Hoon Shin	69520bc695	Add logging for cudagraph related info (#29825 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-12-03 01:01:48 -08:00
Sage Moore	e6f114ac25	[Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults (#29911 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-12-02 13:20:22 -09:00
Isotr0py	63b1da76ba	[Chore]: Reorganize gguf utils funtions under `transformers_utils` (#29891 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-02 17:33:23 +00:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
Cyrus Leung	f0a28bf661	[Misc] Unify tokenizer registration (#29767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-01 11:34:58 +00:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Tsukasa OI	762a4a6ca9	[Frontend] Perform offline path replacement to `tokenizer` (#29706 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2025-11-28 18:32:08 -08:00
Yanan Cao	3461e7efd8	[Frontend] Remap -O to -cc commandline flag (#29557 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-11-28 21:51:12 +00:00
Cyrus Leung	8d9338fae4	[Chore] Rename `Processor` to `InputProcessor` (#29682 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 09:35:41 -08:00
Isotr0py	f946a8d743	[Chore]: Reorganize model repo operating functions in `transformers_utils` (#29680 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-28 08:46:51 -08:00
Cyrus Leung	9e6bcda3ac	[mypy] Enable type checking for more directories (#29674 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 08:39:27 -08:00
wang.yuqi	f4b76056ee	Improve enable chunked_prefill & prefix_caching logic. (#26623 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-27 22:05:48 -08:00
Cyrus Leung	ea228b4491	[Misc] Remove unused code from `protocol.py` (#29616 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-27 18:39:59 +00:00
Morrison Turnansky	0838b52e2e	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): Set up -O infrastructure (#26847 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: adabeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-27 01:55:58 -08:00
Harry Mellor	a1f2676879	Scheduled removal of `override_pooler_config` and `disable_log_requests` (#29402 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-25 16:08:57 +00:00
Yifan Qiao	48ddb02b79	[Hybrid Allocator] Support KV cache groups with different block_size (#29143 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-25 10:30:57 -05:00
Injae Ryou	794029f012	[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type (#29137 ) Signed-off-by: Injae Ryou <injaeryou@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-25 14:28:53 +00:00
Thomas Parnell	516c3f7847	[Bugfix] Fix logic for choosing default prefix caching setting (#29393 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-25 14:05:10 +00:00
wang.yuqi	de6889946b	[Misc] Suppress log outputs when constructing the default vllm config. (#29291 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 03:00:44 -08:00
zhrrr	f242cfcdd5	[Perf] use cpu all reduce to avoid sync when async_scheduling & dp > 1 (#29311 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-11-25 15:31:07 +08:00
Harry Mellor	316c8492bf	Scheduled removal of `guided_*` config fields (#29326 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 05:24:05 +00:00
Harry Mellor	a4ad43ad5a	Scheduled removal of `ParallelConfig`'s direct child EPLB fields (#29324 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 01:58:58 +00:00
Fadi Arafeh	730bd35378	[perf][cpu] Accelerate paged attention GEMMs (QK, PV) on Arm CPUs with NEON (#29193 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-11-22 09:04:36 -08:00
Cyrus Leung	5a4802588e	[Misc] Further clean up chunked prefill and prefix caching init (#29186 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 19:34:15 +08:00
Cyrus Leung	ceca060501	[Deprecation] Deprecate `seed=None` (#29185 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 18:19:25 +00:00
Cyrus Leung	d7219bcda3	[Misc] Move dynamic seed initialization to `EngineArgs` (#29165 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 15:27:44 +00:00
Jee Jee Li	9875be6431	[LoRA][2/2]Remove LoRA extra vocab (#28545 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-21 09:46:43 +08:00
Samit	371b1d4c61	[RL] Add Pause and Resume Generation for Asynchronous RL Training (#28037 ) Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: samithuang <285365963@qq.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-20 03:01:03 -08:00
Cyrus Leung	20e4497be2	[V0 Deprecation] Remove `num_lookahead_slots` (#29000 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-20 06:39:10 +00:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Didier Durand	7ed27f3cb5	[Doc]: fix typos in various files (#28945 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-18 22:52:30 -08:00
Cyrus Leung	bf9e1e8767	[Bugfix] Fix wrong CLI defaults for dynamic `SchedulerConfig` fields (#28872 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-17 20:30:29 -08:00
Lucia Fang	b316ac6589	[V1] Support MP Executor for multi node distributed inference (#23691 ) Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-16 09:01:21 +00:00
Zhuohan Li	dd6ac1c2bb	[RL] [V1] Remove unused device argument from reset_kv_cache (#28766 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-11-14 23:59:42 -08:00
Nicolò Lucchesi	6f1e7f7226	[DisaggEverything] Tokens in<>out `/generate` endpoint (#24261 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 09:58:01 -07:00
Cyrus Leung	511a6b611d	[Config] Clean up SchedulerConfig initialization (#28665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 22:41:02 +08:00
elvischenv	5d6ce2b960	[Perf] Support stream interval for reducing host overhead (#27869 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-13 13:21:25 -05:00
Nick Hill	327c0a9a23	[BugFix] Ensure `EngineArgs.create_engine_config` is idempotent (#28515 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-13 17:14:08 +00:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Li, Jiang	7f829be7d3	[CPU] Refactor CPU attention backend (#27954 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-12 09:43:06 +08:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
zhangsicheng5	2108a571d7	[DCP] Support dcp kv_cache interleave size > 1 (#26696 ) Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: Qiu <qiuchunshuo@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-09 04:45:27 +09:00

1 2 3 4 5 ...

989 Commits