xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-03-16 15:47:22 +08:00

Author	SHA1	Message	Date
Nathan Price	05a83dc6ee	feat(api): Eager chat template warmup to eliminate first-request latency (#30700 ) Signed-off-by: Nathan Price <nathan@abridge.com>	2025-12-18 00:01:29 +00:00
Varun Sundar Rabindranath	e3fc374a9a	[BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM (#30899 )	2025-12-17 15:00:59 -08:00
Andrey Talman	e06d0bf0aa	2.9.1 PyTorch release update (#28495 )	2025-12-17 12:20:22 -08:00
Xunzhuo	e3a0f21e6c	[docs]: add ecosystem projects sr in docs/governance (#30844 ) Signed-off-by: bitliu <bitliu@tencent.com>	2025-12-17 18:45:56 +00:00
Matthew Bonanni	7eb6cb6c18	[Attention] Update tests to remove deprecated env vars (#30563 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-17 09:49:59 -08:00
Nicolò Lucchesi	9ca8cb38fd	[CI][Bugfix] Fix flaky `tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio` (#30878 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-17 18:49:56 +01:00
Cyrus Leung	2497228ad4	[Chore] Factor out logic for requesting initial memory (#30868 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-17 07:32:17 -08:00
KimHyemin	196cdc3224	[Model] Gemma3: Support untied word embeddings (#30827 ) Signed-off-by: www-spam <panmahm@naver.com>	2025-12-17 07:11:18 -08:00
高鑫崧	b7b6a60aca	Adapt the old parameter enable_thinking in chat_template_kwargs (#30852 ) Signed-off-by: xinsong.gao <1418762819@qq.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-12-17 07:10:59 -08:00
rongfu.leng	9e67c4ce98	[Docs] fix function name (#30748 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-12-17 12:14:45 +00:00
Jialin Ouyang	6e9dbcc50e	[Fix] uniform decode batch check (#30747 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-12-17 19:58:43 +08:00
Hank_	6482e3895b	chores: adjust the attn register param order (#30688 ) Signed-off-by: Hank <hcc.mayday@gmail.com>	2025-12-17 19:58:16 +08:00
Harry Mellor	fb980eb2fd	Fix lazy import (#30858 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-17 03:33:50 -08:00
baoqian426	84896fda22	[Bugfix] deepseek-V3.2 self.weights_proj has no bias (#30841 ) Signed-off-by: baoqian <1354987947@qq.com> Signed-off-by: baoqian426 <1354987947@qq.com>	2025-12-17 03:32:34 -08:00
Kevin H. Luu	4bf6c23668	[ci] Sync test areas yaml file with test-pipeline (#30862 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-12-17 02:30:56 -08:00
Chauncey	9ad5b21710	[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory (#30749 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-17 02:27:30 -08:00
Wentao Ye	f284d7bd0c	[Bug] Fix AttributeError: 'ColumnParallelLinear' object has no attribute `weight_scale_inv` (#30823 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-17 02:00:35 -08:00
Zhengxu Chen	53cd7f868b	[compile] Recompile graph module during Dynamo cache loading. (#30743 ) Signed-off-by: Zhengxu Chen <zhxchen17@fb.com>	2025-12-17 02:00:12 -08:00
danielafrimi	7b966ae2ba	[Fix]Load kv-cache dtype from hf_quant_config.json automatically (fix for reverted PR) (#30785 ) Signed-off-by: <> Co-authored-by: root <root@gpu-937.slurm-workers-slurm.slurm.svc.cluster.local>	2025-12-17 01:56:38 -08:00
Zhengxu Chen	9db1db5949	[compile] Ignore VLLM_FORCE_AOT_LOAD from cache factors (#30809 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-17 01:56:24 -08:00
Zhengxu Chen	177c391db2	[compile] Disable aot when eager backend is used. (#30810 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-17 01:55:56 -08:00
Michael Goin	519ef9a911	[UX] Make `vllm bench serve` discover model by default and use --input-len (#30816 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-17 01:55:30 -08:00
Ye (Charlotte) Qi	a100152288	[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#30842 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-12-17 01:54:21 -08:00
Andrew Xia	4c054d89aa	[Doc][ResponsesAPI] add documentation (#30840 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-17 01:53:02 -08:00
Sheng Lin	f4e884f222	[NIXL][Bugfix] Fix NIXL/RDMA registration failure over CuMemAllocator (#29569 ) Signed-off-by: Somoku <linsh0@protonmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-12-17 01:52:58 -08:00
Xinyu Chen	3b1d440ede	CustomOp: grouped topk (#29575 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2025-12-17 17:43:00 +08:00
Asaf Joseph Gardin	a9e15c21ef	[Mamba] Removed disable cascade attn in MambaModelConfig (#30712 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-12-17 08:48:53 +00:00
Robin	20fda43151	[Bugfix][Frontend] Prevent IndexError in MiniMax M2 tool parser during streaming extraction (#30555 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-12-17 16:37:57 +08:00
Yan Ma	4f735babb7	[XPU] fix broken fp8 online quantization for XPU platform (#30831 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2025-12-17 00:28:13 -08:00
Li, Jiang	0cd5353644	[Bugfix][CPU] Fix CPU backend ROPE dispatch for VL models (#30829 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-16 23:25:12 -08:00
Michael Goin	d4d2751732	Update note comment for flashinfer attention warmup (#30711 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-16 21:29:03 -08:00
shanjiaz	009a773828	bump up compressed tensors version to 0.13.0 (#30799 ) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>	2025-12-16 21:01:04 -08:00
Cyrus Leung	44d3b1df3d	[CI/Build] Fix compatibility between #30244 and #30396 (#30787 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-16 20:21:19 -08:00
Fadi Arafeh	bb5ac1fe38	[CPU] Add action to automatically label CPU related PRs (#30678 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-12-17 04:21:07 +00:00
Michael Goin	811cdf5197	Update model-hosting-container-standards to 0.1.10 (#30815 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-12-16 17:52:14 -08:00
Grzegorz K. Karch	f5db6385a1	Fix nemotron_nas intermediate_size computation (#30795 ) Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>	2025-12-17 01:06:28 +00:00
Amr Mahdi	c0a88df7f7	[docker] Allow kv_connectors install to fail on arm64 (#30806 ) Signed-off-by: Amr Mahdi <amrmahdi@meta.com>	2025-12-16 16:41:57 -08:00
Nicolò Lucchesi	e087fbc393	[MM] Pass FA version in ViT Attn (#30756 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-17 07:54:45 +08:00
Michael Goin	e80455ca8b	Replace deprecated enable_fusion with fuse_norm_quant in test_rms_group_quant (#30817 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-16 23:40:47 +00:00
TJian	2410132bb1	[ROCm] [Bugfix] Fix torch sdpa hallucination (#30789 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-12-16 15:32:43 -08:00
Michael Goin	0a1ab1e565	[Perf][Kernels] Vectorize `csrc/activations_kernels.cu` (#29512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-16 14:56:02 -08:00
Wentao Ye	b6ec077e05	[CI] Skip ci failure test (#30804 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-16 22:47:53 +00:00
Jinzhen Lin	ce96857fdd	[Kernel][Quantization][MoE] add marlin kernel support for turing (sm75) (#29901 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-12-16 14:35:28 -08:00
Daniel Cámpora	eaa82a709a	[Bugfix][DSV32] Fix overflow in topk. (#30754 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-12-16 14:21:17 -08:00
Roger Wang	f5f51e5931	[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Sun Kim <sunytokki@gmail.com>	2025-12-16 14:18:17 -08:00
Lucas Wilkinson	9fec0e13d5	[Attention] Cache attention metadata builds across hybrid KV-cache groups (#29627 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>	2025-12-16 17:10:16 -05:00
jiahanc	254a7f8fd6	[Perf] Do FP4 quant before All gather on flashinfer trtllmgen MOE (#30014 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-12-16 13:01:48 -08:00
Wentao Ye	f21f5ea38c	[Refactor] Small refactor for group topk (#30562 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-12-16 14:50:59 -05:00
Nicolò Lucchesi	ca702a14dc	[Frontend] Add `max-completion-token` option to transcription/translation endpoints (#30769 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-16 19:36:49 +00:00
Michael Goin	10ee1c64cf	[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test (#30723 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-16 14:28:34 -05:00

1 2 3 4 5 ...

12348 Commits