xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-27 11:47:22 +08:00

Author	SHA1	Message	Date
Wentao Ye	c894c5dc1f	[Bug Fix] Fix address/port already in use error for deep_ep test (#20094 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-26 22:33:13 +08:00
Michael Goin	1f5d178e9c	Revert "[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine" (#20128 )	2025-06-26 07:32:22 -07:00
TJian	27c065df50	[Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) (#19904 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-06-26 12:42:31 +00:00
Michael Yao	84c260caeb	[Docs] Improve frameworks/helm.md (#20113 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-06-26 10:41:51 +00:00
Reid	167aca45cb	[Misc] Use collapsible blocks for benchmark examples. (#20017 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-26 03:35:16 -07:00
Li, Jiang	0567c8249f	[CPU] Fix torch version in x86 CPU backend (#19258 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-06-26 03:34:47 -07:00
Wentao Ye	d188913d99	[Refactor] Remove unused library (#20099 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-26 09:16:10 +00:00
Cyrus Leung	1d7c29f5fe	[Doc] Update docs for New Model Implementation (#20115 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-26 00:47:06 -07:00
Seiji Eicher	65397e40f5	[Bugfix] Allow `CUDA_VISIBLE_DEVICES=''` in `Platform.device_id_to_physical_device_id` (#18979 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-06-26 00:01:57 -07:00
Ekagra Ranjan	9502c38138	[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline (#20083 )	2025-06-25 22:06:27 -07:00
Nicolò Lucchesi	2582683566	[PD] Skip `tp_size` exchange with rank0 (#19413 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-06-25 20:04:39 -07:00
Michael Goin	754b00edb3	[Bugfix] Fix Mistral tool-parser regex for nested JSON (#20093 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-26 01:01:17 +00:00
Michael Goin	296ce95d8e	[CI] Add SM120 to the Dockerfile (#19794 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-25 16:23:56 -07:00
Chenyaaang	2d7620c3eb	[TPU] Add TPU specific var VLLM_TPU_MOST_MODEL_LEN (#19919 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-06-25 15:51:02 -07:00
Nick Hill	55c65ab495	[P/D] Avoid stranding blocks in P when aborted in D's waiting queue (#19223 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-25 15:19:44 -07:00
Chengji Yao	2cc2069970	[TPU][Bugfix] fix kv cache padding (#20048 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-06-25 21:24:10 +00:00
zhrrr	9f0608fc16	[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine (#20062 ) Signed-off-by: izhuhaoran <izhuhaoran@qq.com>	2025-06-25 21:03:17 +00:00
QiliangCui	4e0db57fff	Fix the path to the testing script. (#20082 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-06-25 20:48:17 +00:00
Nick Hill	c40692bf9a	[Misc] Add parallel state `node_count` function (#20045 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-25 13:38:53 -07:00
lkchen	4734704b30	[PD] let toy proxy handle /chat/completions (#19730 ) Signed-off-by: Linkun <github@lkchen.net>	2025-06-25 15:17:45 -04:00
Eldar Kurtić	8b8c209e35	static_scaled_fp8_quant should not run when scale.numel is not 1 (#20076 )	2025-06-25 15:08:03 -04:00
lsz05	23a04e0895	[Fix] Support cls pooling in ModernBertPooler (#20067 ) Signed-off-by: shengzhe.li <shengzhe.li@sbintuitions.co.jp>	2025-06-25 15:07:45 -04:00
Dipika Sikka	02c97d9a92	[Quantization] Add compressed-tensors emulations support for NVFP4 (#19879 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-06-25 14:28:19 -04:00
Nicolò Lucchesi	e795d723ed	[Frontend] Add `/v1/audio/translations` OpenAI API endpoint (#19615 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-06-25 17:54:14 +00:00
cjackal	8359f4c8d8	[V1][Speculative Decoding] Fix DeepSeek MTP (#20022 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-06-25 08:41:02 -07:00
Michael Goin	bf5181583f	[Doc] Guide for Incremental Compilation Workflow (#19109 )	2025-06-25 22:06:46 +09:00
Reid	c53fec1fcb	[doc] add reference link for Intel XPU (#20064 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-25 12:24:07 +00:00
Lucas Wilkinson	0f9e7354f5	[BugFix] Fix full-cuda-graph illegal memory access in FA3 (#20057 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-06-25 08:39:04 +00:00
Aaron Pham	ba7ba35cda	[Chore] debloat some initial logs (#19438 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-06-25 06:36:22 +00:00
bnellnm	015fab8c2f	[Kernels][Bugfix] Use torch op for all kernels in FusedMoE forward. Add additional testing for cudagraphs. (#19717 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-06-24 23:22:58 -07:00
Max Wittig	f59fc60fb3	[Feat][CLI] enforce-include-usage (#19695 ) Signed-off-by: Max Wittig <max.wittig@siemens.com>	2025-06-25 01:43:04 -04:00
Wentao Ye	879f69bed3	[Refactor] Remove duplicate `ceil_div` (#20023 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-25 05:19:09 +00:00
David Xia	7108934142	[Frontend] speed up import time of vllm.config (#18036 ) Signed-off-by: David Xia <david@davidxia.com>	2025-06-25 00:41:11 -04:00
h-avsha	3443aaf8dd	Move to a faster base64 implementation (#19984 ) Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>	2025-06-24 20:33:51 -07:00
Isotr0py	2273ec322c	Revert "Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor" (#20030 )	2025-06-25 11:23:29 +08:00
Wentao Ye	a6c4b87fbc	Revert "[Feature] Integrate new deepgemm (#19820 )" (#20049 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 19:45:22 -07:00
Brayden Zhong	1afa9948f5	[Llama4] Update `attn_temperature_tuning` (#19997 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-06-24 22:42:53 -04:00
Eli Uriegas	0d06b533a0	cmake: Update vllm_flash_attn for vllm_kernels (#20032 ) Signed-off-by: Eli Uriegas <eliuriegas@meta.com>	2025-06-24 22:44:10 +00:00
Boyuan Feng	c01d1c5aba	use .dev for version comparison with pytorch nightly release (#20031 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-06-24 21:52:16 +00:00
Brayden Zhong	ead369845d	[Easy] Remove submodule added in #19463 (#20039 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-06-24 13:23:15 -07:00
Wentao Ye	c6e3bba8e6	[Feature] Integrate new deepgemm (#19820 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 12:51:56 -07:00
lkchen	91f7d9d0b6	[P/D] Asynchronously do _nixl_handshake (#19836 ) Signed-off-by: Linkun Chen <github@lkchen.net> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-24 12:46:10 -07:00
Nick Hill	8619e7158c	[BugFix] Fix multi-node offline data parallel (#19937 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-24 12:45:20 -07:00
d.transposed	c635c5f744	[Misc][Benchmarking] Add variable request-rate ("ramp-up") to the benchmarking client. (#19423 ) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-06-24 18:41:49 +00:00
Lucas Wilkinson	a045b7e89a	[Perf] Improve/Fix-regression for FA3 in High QPS regimes (#19463 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-06-24 13:09:01 -04:00
amit	981eeca41a	[Fix][V1] Remove --scheduling-policy oracle (#20010 ) Signed-off-by: amit <amit.man@gmail.com>	2025-06-24 09:52:15 -07:00
Reid	26d34eb67e	refactor example - qwen3_reranker (#19847 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-24 14:03:20 +00:00
Li, Jiang	53da4cd397	[Bugfix][CPU] Fix InputBatch for pooling models in the CPU v1 (#20014 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-06-24 13:20:04 +00:00
Vadim Gimpelson	9a3b88328f	[PERF] Speedup of MRoPE prepare inputs (#19939 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>	2025-06-23 23:01:26 -07:00
Reid	3014c920da	add some examples for other benchmark scripts (#19893 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-24 05:57:46 +00:00

1 2 3 4 5 ...

7323 Commits