xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-24 11:06:47 +08:00

Author	SHA1	Message	Date
Matthew Bonanni	794a7875ee	[Misc] Consistent case for `vllm bench serve` results (#30403 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-10 09:44:02 -08:00
Mark McLoughlin	2dcbac9077	[Docs] Generate full list of metrics in user docs (#30388 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-10 16:09:34 +00:00
Wilson Wu	3bdd426636	Fix typos in comments across multiple files (#30345 ) Signed-off-by: Wilson Wu <iwilsonwu@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-12-09 20:05:28 -08:00
Benjamin Chislett	e858bfe051	[Cleanup] Refactor profiling env vars into a CLI config (#29912 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-09 13:29:33 -05:00
Hubert de La Jonquiere	c72ea10723	[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. (#30056 )	2025-12-09 18:54:08 +08:00
Fanli Lin	c2e1987a6e	[Doc] update Intel GPU MM status in Feature x Hardware matrix (#30294 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-12-09 05:16:44 +00:00
Or Ozeri	4c6fd25880	kv_transfer: Rename the shared storage connectors (#30201 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-08 20:46:09 -08:00
Ming Yang	60d17251c9	[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP (#28782 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-09 00:01:08 +00:00
Simon Mo	77072e93b3	[docs] governance documents (#24801 ) Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-08 12:06:20 +00:00
wang.yuqi	9e77ffca3f	[Model][7/N] Improve all pooling task \| Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-08 08:10:09 +00:00
Zhiyu	cd00c443d2	[Misc] Rename TensorRT Model Optimizer to Model Optimizer (#30091 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-12-08 07:05:27 +00:00
Isotr0py	b952f4d3c3	[v1] Add PrefixLM support to FlexAttention backend (#27938 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-07 15:51:36 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
jeremyteboul	dce6d229f7	Support multiple image/audio embeddings per requests (#29988 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2025-12-07 04:34:24 +00:00
Viacheslav	21bb323542	Gigachat 3 tool parser and tests (#29905 ) Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>	2025-12-06 12:04:14 +00:00
redwrasse	6476382384	prefix caching design doc sha256 now default (#29261 ) Signed-off-by: redwrasse <mail@redwrasse.io>	2025-12-06 07:39:56 +00:00
Russell Bryant	3633035a3f	[Misc] Rename CohereForAI references to CohereLabs (#30147 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-12-05 19:41:40 +00:00
Yanan Cao	62b3333448	[Frontend] Remove deprecated -O.xx flag (#29991 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-05 00:47:22 -08:00
Tiger Xu / Zhonghu Xu	60a66ea2dc	[DOC]: Add kthena to integrations (#29931 ) Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>	2025-12-05 08:11:03 +00:00
Hubert de La Jonquiere	befb59e5b1	[Model] Add Holo2 reasoning parser (#30048 ) Signed-off-by: hdlj-h <hubert@hcompany.ai>	2025-12-05 10:38:45 +08:00
TimWang	690cc3ef20	docs: update metrics design doc to use new vllm:kv_cache_usage_perc (#30041 ) Signed-off-by: Tim <tim.wang03@sap.com>	2025-12-04 23:37:14 +00:00
Tao Yun	6dcb07f676	support qwen3-vl handle requests with embeddings (#30037 ) Signed-off-by: taoyun <1069423820@qq.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-04 17:34:06 +00:00
Shengqi Chen	990f806473	[Doc] clarify nightly builds in developer docs (#30019 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-05 00:28:37 +08:00
Harry Mellor	9998ea5b57	Delete HF version of Phi 4 MM (#30049 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 13:44:50 +00:00
wang.yuqi	74c4d80c6c	[Model][6/N] Improve all pooling task \| Support chunked prefill with ALL pooling (#27145 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-04 13:44:15 +00:00
dtc	842aba501d	[P/D] Introduce Mooncake Transfer Engine as kv_connector (#24718 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: dtc <dtcccc@linux.alibaba.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-12-04 09:51:36 +00:00
CYJiang	fd68e909db	[docs] Remove _total from counter metrics names (#30028 ) In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API. Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com>	2025-12-04 07:46:15 +00:00
Cyrus Leung	9ae2f60374	[Misc] Various cleanups for MM input processing (#29970 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 06:22:20 +00:00
bnellnm	2902c34826	[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 20:49:00 +00:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
ioana ghiban	1bb17ecb39	[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868 ) Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>	2025-12-03 13:33:50 +00:00
ioana ghiban	15b1511a15	[GPU Backend] [Doc]: Remove duplicate statements on missing GPU wheels. (#29962 ) Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>	2025-12-03 12:56:47 +00:00
Amr Mahdi	f5d3d93c40	[docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452 ) Signed-off-by: Amr Mahdi <amrmahdi@meta.com>	2025-12-03 11:41:53 +00:00
Fadi Arafeh	78f4bb0ba8	[DOC] Add Arm to list of compute resouces providers (#29894 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-12-03 11:36:58 +00:00
Russell Bryant	b08025a83b	[Docs] Discuss api key limitations in security guide (#29922 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-12-02 20:57:28 -08:00
wang.yuqi	2eb4fe9129	[examples] Resettle pooling examples. (#29365 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 15:54:28 +00:00
Julien Denize	d8c6210eea	Add Mistral Large 3 and Ministral 3 (#29757 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: Mickael Seznec <mickael@mistral.ai> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Mickael Seznec <mickael@mistral.ai>	2025-12-02 10:29:00 +00:00
Louie Tsai	8bbcf8b6e7	[vLLM Benchmark Suite] Add default parameters section and update CPU benchmark cases (#29381 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-12-02 09:00:23 +00:00
Shengqi Chen	4b612664fd	[CI] Renovation of nightly wheel build & generation (take 2) (#29838 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 22:17:10 -08:00
Kevin H. Luu	1336a1ea24	Revert #29787 and #29690 (#29815 )	2025-12-01 13:42:03 -08:00
Finbarr Timbers	38caf7fa1a	Update FAQ on interleaving sliding windows support (#29796 ) Signed-off-by: Finbarr Timbers <finbarrtimbers@gmail.com>	2025-12-01 19:15:19 +00:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
sangbumlikeagod	092bb73b8a	[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209 ) Signed-off-by: sangbumlikeagod <oironese@naver.com> Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>	2025-12-01 18:19:17 +01:00
Shengqi Chen	36db0a35e4	[CI] Renovation of nightly wheel build & generation (#29690 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 21:25:39 +08:00
wang.yuqi	62de4f4257	[Frontend] Resettle pooling entrypoints (#29634 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-01 15:30:43 +08:00
Yifei Zhang	1ab8fc8197	Make PyTorch profiler gzip and CUDA time dump configurable (#29568 ) Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>	2025-12-01 04:30:46 +00:00
Cyrus Leung	2afcec4dec	[Misc] Update `TokenizerLike` interface and move `get_cached_tokenizer` (#29730 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 14:59:47 +08:00
Jinzhen Lin	1656ad3704	[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-11-29 07:19:33 -08:00
dublc	f4341f45d3	[Doc]: fix code block rendering (#29728 ) Signed-off-by: dublc <jdublc0x@gmail.com>	2025-11-29 13:46:48 +00:00

1 2 3 4 5 ...

1746 Commits