xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-02 20:37:11 +08:00

Author	SHA1	Message	Date
Qidong Su	4587063267	Patch DeepEP when building docker image with CUDA 13 (#29154 ) Signed-off-by: Qidong Su <soodoshll@gmail.com>	2025-11-22 23:25:13 +00:00
Wentao Ye	472fdee974	[Chore] Update batch invariant code owner (#29246 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-22 13:50:02 -08:00
Yizhou	df78aeef08	Refactor: Move CUDA graph dispatch logic earlier (#27382 ) Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-11-22 16:10:31 -05:00
Nick Hill	7df331c66b	[BugFix] Fix chunked prompt logprobs + preemption (#29071 )	2025-11-22 16:07:18 -05:00
Benjamin Bartels	eb5352a770	[CI/build] Removes source compilation from runtime image (#26966 ) Signed-off-by: bbartels <benjamin@bartels.dev>	2025-11-22 10:23:09 -08:00
Cyrus Leung	d1cf8214e5	[Bugfix] Use HF config fields as fallback when loading Mistral config (#29239 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 11:22:48 -07:00
Fadi Arafeh	730bd35378	[perf][cpu] Accelerate paged attention GEMMs (QK, PV) on Arm CPUs with NEON (#29193 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-11-22 09:04:36 -08:00
Federico	f55c76c2b3	chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning (#29240 )	2025-11-22 08:42:48 -08:00
ZiTian Zhao	d84d8f4429	Fix EVS crash when using `video_embeds` inputs in Qwen2.5-VL (#29232 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-22 06:48:59 -08:00
Cyrus Leung	ae66818379	[Misc] Fix pre-commit (#29238 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 06:48:01 -08:00
Nick Hill	d44a63c6d6	[BugFix] Fix returned logprobs with spec decode + prefill chunking (#29216 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-22 22:41:25 +08:00
Nicolò Lucchesi	066209a045	[Attention] Refactor FA `block_size` limitations to hybrid models only (#29084 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-22 06:38:44 -08:00
Bram Wasti	5f7209a793	[tiny] Remove unsupported TRITON_MLA backend from batch invariance (#28832 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-22 21:00:50 +08:00
yihong	2d4978a57e	fix: clean up function never use in setup.py (#29061 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-11-22 05:00:04 -08:00
Nandan Vallamdasu	6965a392a4	Fix: Resolve circular import in model_loader/utils.py (#29189 ) Signed-off-by: nandan2003 <nandan.vallamdasu@outlook.com> Signed-off-by: Nandan Vallamdasu <nandan.vallamdasu@outlook.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-22 04:58:22 -08:00
Cyrus Leung	5a4802588e	[Misc] Further clean up chunked prefill and prefix caching init (#29186 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 19:34:15 +08:00
rasmith	8e22da1d7f	[CI/Build Don't add FLASHINFER backend in test_cpu_offloading.py (#29229 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-22 11:00:54 +00:00
rasmith	a4fdf2405c	[CI/Build] Skip tests that require libcudart in test_lmcache_integration.py (#29228 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-22 10:59:39 +00:00
Jane (Yuan) Xu	e6309acdba	Simplify `from_blob` usage in `get_cuda_view_from_cpu_tensor` (#29027 ) Signed-off-by: Jane Xu <janeyx@meta.com>	2025-11-22 10:35:32 +00:00
jinghanhu	988ee66b0d	Handle triton kernel import exception (#29062 )	2025-11-22 10:07:50 +00:00
Mads Kildegård	ea38474ac5	[Frontend][Responses API] Multi-turn (with type: "output_text") support for non-harmony requests (#29175 ) Signed-off-by: Mads Kildegård <mkildegaard99@gmail.com>	2025-11-22 09:58:22 +00:00
Andrew Xia	742e9ff6b3	[responsesAPI] parse reasoning item input (#28248 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-22 15:42:11 +08:00
Woosuk Kwon	e9056056fb	[Model Runner V2] Limit cudagraph size to max decode batch size (#29221 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-21 20:21:35 -08:00
Jee Jee Li	1489902b53	[LoRA] Cleanup FusedMoEWithLoRA (#29187 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-22 04:01:30 +00:00
Yanan Cao	933f67ecd8	[Bugfix]Fix a conditional to not check zero value (#28754 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-21 19:59:07 -08:00
rasmith	fd65015a14	[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-21 20:34:33 -07:00
Yihua Cheng	77e1c035d0	[chore][LMCache connector] Remove useless logs from lmcache connector (#29069 ) Signed-off-by: ApostaC <yihua98@uchicago.edu>	2025-11-22 03:18:00 +00:00
rasmith	6f403501a0	[CI/Build][AMD] Enable Entrypoints Integration Test (Pooling) to run without error on ROCm (#29212 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-22 02:13:18 +00:00
FlintyLemming	052950e5b3	Add fused MoE config for H200 E160 N192 fp8 (#29182 ) Signed-off-by: FlintyLemming <admin@flinty.moe>	2025-11-21 17:37:51 -08:00
qli88	1ef9c9e294	[CI/Build] Disable test_gptoss_tp.py in 'LoRA TP Test' group for ROCm platform (#29204 ) Signed-off-by: qli88 <qiang.li2@amd.com>	2025-11-21 17:36:19 -08:00
Jie Luo	5c8f2adf50	[Bugfix] Fix block size in block_table with PCP (#29094 ) Signed-off-by: Livinfly <luojie3m@gmail.com>	2025-11-22 01:34:28 +00:00
Ryan Rock	ed8e6843cc	[CI/Build] Add terratorch for AMD (#29205 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2025-11-21 17:31:22 -08:00
Lukas Geiger	d045e22dfe	[Model][Qwen3VL] Tune Triton w8a8 block fp8 kernel for L40s (#29217 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-21 17:30:55 -08:00
Wentao Ye	1d34eb11e0	[CI] Bug: Fix triton import issue (#29202 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 17:14:49 -08:00
Charlie Fu	9a3101b2ba	[Rocm][CI] Fix DeekSeek V2-Lite Accuracy CI (#29135 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-11-21 17:11:02 -08:00
Angela Yi	d5dbdbfcb2	[docs] Fix cudagraph mode config (#29170 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-11-21 17:10:27 -08:00
Lucas Wilkinson	30d6466238	[BugFix] Fix Eagle `IndexError: list index out of range` for even `num_speculative_tokens` (#29102 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-22 00:47:05 +00:00
Woosuk Kwon	e9af6ba62a	[Model Runner V2] Optimize Gumbel Sampling Kernel (#29210 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-21 15:52:28 -08:00
Mark McLoughlin	c6fa3895e9	[KV Connector] Fix async connector prefix cache metrics (#28585 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-21 17:45:00 -05:00
Varun Sundar Rabindranath	3137991f55	[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor (#29162 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-21 14:28:17 -08:00
Julien Denize	57430fc95c	Default model load/config/tokenizer to `mistral` format if relevant files exist (#28659 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 13:58:59 -08:00
Lucas Wilkinson	c68c7b403d	[BugFix] Fix missing symbol triggering FA2 fallback on Hopper (#29107 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-21 13:58:32 -08:00
Ning Xie	53a1ba6ec5	[log] add weights loading time log to sharded_state loader (#28628 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-21 21:06:09 +00:00
Lucas Wilkinson	1840c5cb18	[BugFix] Make sure to allocate worst case MoE workspace during profile run in the DP + EP case (#27426 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-21 11:41:52 -08:00
Woosuk Kwon	1bed891f72	[Chore] Fix pre-commit error after #25266 (#29190 )	2025-11-21 10:21:40 -08:00
Cyrus Leung	ceca060501	[Deprecation] Deprecate `seed=None` (#29185 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 18:19:25 +00:00
Charlie Fu	75648b16dd	[ROCm][CI] Fix config/test_config_generation.py (#29142 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-11-21 17:12:16 +00:00
Chendi.Xue	460d02a417	[NIXL] Fix after virtual block_size for host_buffer with heter kv_layout (#29122 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-11-21 08:55:27 -08:00
Mingyuan Ma	b4c8fbaae2	Add TRTLLM MoE NVFP4 kernel to CompressedTensorsW4A4MoeMethod (#28892 ) Signed-off-by: mingyuanm <mingyuanm@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 09:54:11 -07:00
rasmith	e99e467384	[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py (#29132 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-21 11:53:09 -05:00

1 2 3 4 5 ...

11581 Commits