xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-03 18:35:42 +08:00

Author	SHA1	Message	Date
bk-201	c0cc07e7ee	Merge remote-tracking branch 'origin/main' into mlm-full-lora-support	2025-12-03 15:24:12 +00:00
Yong Hoon Shin	69520bc695	Add logging for cudagraph related info (#29825 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-12-03 01:01:48 -08:00
Jee Jee Li	83556e9d85	Address conflict Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-03 06:10:36 +00:00
Arpit Khandelwal	d7284a2604	[Core] Rename PassConfig flags as per RFC #27995 (#29646 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-03 03:38:55 +00:00
Lucas Wilkinson	5cdd664509	[BugFix] Fix assert in `build_for_cudagraph_capture` (#29893 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-02 16:56:54 -08:00
maang-h	5d91d2b292	[Doc] Add allocate_slots parameter docs (#29777 ) Signed-off-by: maang <maang_h@163.com> Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-12-02 23:23:09 +00:00
Chauncey	0a9caca9f5	[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 22:42:28 +00:00
jthomson04	1528e079e2	[Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor (#29826 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-12-02 21:25:52 +00:00
Matthew Bonanni	1d93f11675	[Attention][CUDAGraph] Remove CG padding from attention backends (#29352 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-02 13:48:08 -05:00
Julien Denize	d8c6210eea	Add Mistral Large 3 and Ministral 3 (#29757 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: Mickael Seznec <mickael@mistral.ai> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Mickael Seznec <mickael@mistral.ai>	2025-12-02 10:29:00 +00:00
Wushi Dong	0037b5746a	[Core] Eliminate redundant is_encoder_decoder lookups (20-40us/step) (#29800 ) Signed-off-by: Wushi Dong <dongws@meta.com>	2025-12-02 07:08:07 +00:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
usberkeley	81fe3f82af	[BugFix] Fix index error in ngram_proposer (#29779 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-12-02 04:48:11 +00:00
Seiji Eicher	22274b2184	[Misc] Add ReplicaId to Ray metrics (#24267 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Co-authored-by: rongfu.leng <1275177125@qq.com>	2025-12-02 03:21:44 +00:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
Nick Hill	44822d7ff2	[BugFix] Preserve spec decoding uniform decode when scheduling (#29759 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-01 17:15:52 -08:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
Isotr0py	b95db244ee	[v1] Add real sliding window calculation to FlexAttention direct BlockMask building (#26015 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Co-authored-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2025-12-01 13:12:51 +00:00
Mickaël Seznec	86e178f7c4	[crashfix] Eagle + multimodal can crash on mm cache miss (#29750 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-12-01 17:29:33 +08:00
Yifei Zhang	1ab8fc8197	Make PyTorch profiler gzip and CUDA time dump configurable (#29568 ) Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>	2025-12-01 04:30:46 +00:00
Woosuk Kwon	ec38a7368d	[Model Runner V2] Use packed mask for prompt bin counts (#29756 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-30 14:15:42 -08:00
Pleaplusone	8c363ed666	[ROCm][Attention] Sliding window support for `AiterFlashAttentionBackend` (#29234 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-30 11:31:50 +00:00
Cyrus Leung	64bc09ba27	[Core] Enable `inputs_embeds_size` separate from `hidden_size` (#29741 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 17:31:12 +08:00
Cyrus Leung	2afcec4dec	[Misc] Update `TokenizerLike` interface and move `get_cached_tokenizer` (#29730 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 14:59:47 +08:00
Vensen	66b5840287	[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783 ) Signed-off-by: vensen <vensenmu@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-11-30 14:24:25 +08:00
Huamin Li	82c795d6f2	Fix AttributeError about _use_fi_prefill (#29734 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-30 06:04:55 +00:00
Cyrus Leung	fa59fe417f	[Chore] Move `detokenizer_utils` to `vllm/tokenizers` (#29727 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 06:25:17 -08:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Woosuk Kwon	f223ed4181	[Model Runner V2] Fuse penalties and temperature into single kernel (#29720 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-29 02:29:16 -08:00
Woosuk Kwon	6afc0ffaf6	[Model Runner V2] Add sample/ directory and reorganize files (#29719 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-29 00:41:01 -08:00
Jee Jee Li	39e63dec7c	[LoRA] Cleanup LoRA unused code (#29611 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-28 22:52:58 -08:00
Woosuk Kwon	4a80ad0a25	[Model Runner V2] Don't use UVA buffer for prefill_len (#29713 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-28 20:27:16 -08:00
Lucas Wilkinson	e23f665d83	[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable (#29698 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-28 20:19:01 -08:00
Woosuk Kwon	ca1b1e7296	[Model Runner V2] Refactor prefill token preparation (#29712 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-28 19:49:17 -08:00
Woosuk Kwon	1dcafb3dea	[Model Runner V2] Support penalties using bin counts (#29703 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-28 17:53:17 -08:00
Augusto Yao	9726e64530	bugfix: correct attn output with base 2 or e (#28840 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>	2025-11-29 07:52:12 +08:00
Benjamin Chislett	1986de1375	[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-28 22:25:05 +00:00
Cyrus Leung	8d9338fae4	[Chore] Rename `Processor` to `InputProcessor` (#29682 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 09:35:41 -08:00
Didier Durand	fae6943068	[Doc]: fixing typos in multiple files. (#29685 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-28 08:41:41 -08:00
Cyrus Leung	9e6bcda3ac	[mypy] Enable type checking for more directories (#29674 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 08:39:27 -08:00
Harry Mellor	9eec282cb5	Guard FlashInfer sampler using the same check as FlashInfer attention backend (#29415 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-28 08:34:48 -08:00
Nick Hill	8e7a891602	[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-28 20:52:23 +08:00
Cyrus Leung	953d9c820b	[mypy] Pass type checking for `vllm/utils` and `vllm/v1/pool` (#29666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 20:40:47 +08:00
maang-h	cc0f2a0e19	[Doc] Improve abnormal information string (#29655 ) Signed-off-by: maang <maang_h@163.com>	2025-11-28 00:12:20 -08:00
wang.yuqi	f4b76056ee	Improve enable chunked_prefill & prefix_caching logic. (#26623 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-27 22:05:48 -08:00
EanWang211123	37b15e97e8	[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-11-27 22:05:45 -08:00
maang-h	c7ba1f6bc7	[BugFix] Fix ValueError in NewRequestData repr methods (#29392 ) Signed-off-by: maang <maang_h@163.com>	2025-11-28 13:42:30 +08:00
Lucas Wilkinson	be493e0b3c	[BugFix] Fix new nightly failures (#29578 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-27 13:45:38 -08:00
Woosuk Kwon	ae0ce1be27	[Model Runner V2][BugFix] Keep reference to GPU tensors in AsyncOutput (#29623 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-27 12:38:53 -08:00
Andrii Skliar	a5345bf49d	[BugFix] Fix `plan` API Mismatch when using latest FlashInfer (#29426 ) Signed-off-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com> Co-authored-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com>	2025-11-27 11:34:59 -08:00

1 2 3 4 5 ...

1805 Commits