xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-24 13:47:00 +08:00

Author	SHA1	Message	Date
Cyrus Leung	b09806e28f	[Bugfix] Dictionary MM embeddings for online chat (#30507 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-13 15:48:56 +08:00
Tsukasa OI	fdc135d768	[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization (#30310 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2025-12-13 13:55:14 +08:00
Roberto L. Castro	4fa7ce46f3	[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 ) Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-12 19:34:23 -08:00
Nicolò Lucchesi	57e9bf1864	[CI] Whisper logprobs tests (#30504 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-13 10:49:11 +08:00
Michael Goin	2f32a68d75	[CI] Update several models in registry that are available online now (#30514 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-12-12 18:28:13 -08:00
Matthew Bonanni	f5dfbbd8e9	[Docs] Remove references to `VLLM_ATTENTION_BACKEND` (#30564 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-13 10:20:15 +08:00
Michael Goin	fc0119425c	Add IBM and Red Hat to compute resources sponsors (#30581 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-12-13 01:34:23 +00:00
Matthew Bonanni	86a3261525	[Bugfix] Pass FA version in `MultiHeadAttention` (#30575 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-13 00:02:11 +00:00
rasmith	08f8a5627e	[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality (#30292 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-12 18:41:56 -05:00
Kevin H. Luu	b4039c08b5	[ci] Mark PrimeRL integration test as soft fail (#30578 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-12-12 14:13:09 -08:00
Wentao Ye	1e6b115300	[Refactor] Reduce duplicate code in `per_token_group_quant` cuda kernels (#30496 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-12 16:45:23 -05:00
danielafrimi	13618626df	[MoE-FP8-modelopt] Add FlashInfer alignment padding for intermediate dimensions (#29748 ) Signed-off-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster> Signed-off-by: dafrimi <dafrimi@nvidia.com> Co-authored-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-12 20:42:32 +00:00
danielafrimi	6ec0d8dbe4	[Fix]Load kv-cache dtype from hf_quant_config.json automatically (#29980 ) Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>	2025-12-12 11:27:47 -08:00
Li, Jiang	9693dd0fe3	[CI/Build] Add x86 CPU wheel release pipeline (#28848 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-12-12 19:21:35 +00:00
Xin Yang	1f19d8f899	[Perf] Set split_k to 1 for triton_kernels (#30528 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-12-12 14:07:57 -05:00
shivampr	cd7740ac5c	[ROCm] Enable Triton ScaledMM fallback + kernel selection fix (#26668 ) Signed-off-by: Shivam <shivampr.dev@gmail.com> Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-12 13:28:20 -05:00
Wentao Ye	02a5880394	[CI] Fix mypy for vllm/v1/executor (#30517 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-12 18:05:34 +00:00
realliujiaxu	d2c919dcc2	[bugfix] fix bug when top_logprobs=0 with spec decoding (#30059 ) Signed-off-by: realliujiaxu <realliujiaxu@163.com>	2025-12-12 09:03:35 -08:00
Benjamin Bartels	f3237f3f6b	[Frontend] Fixes anthropic streaming message_start usage nesting (#30266 ) Signed-off-by: bbartels <benjamin@bartels.dev>	2025-12-12 16:28:54 +00:00
jvlunteren	9c0ee995a8	[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel (#28306 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com> Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-12-12 16:55:40 +01:00
Michael Goin	09ad3b76b3	[Bug] Fix attention_backend arg string parsing (#30534 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-12 08:40:50 -07:00
Christina Norman	dc13c99eed	fix(gguf): Disable bfloat16 for GGUF on blackwell device (#30408 ) Signed-off-by: Christina <truffle@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Christina Norman <christina@example.com> Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-12 23:10:12 +08:00
Vladislav Nosivskoy	3e34adcdfb	[DeepSeek V3.2] Proper drop_thinking logic (#30490 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>	2025-12-12 15:01:06 +00:00
Lucas Wilkinson	3e41992fec	[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-12 05:57:47 -08:00
吴坎	91401c7a26	[Bugfix] Fix CMakeLists Environment Variable (#21804 ) Signed-off-by: wu-kan <github@wu-kan.com> Signed-off-by: 吴坎 <github@wu-kan.cn> Signed-off-by: wu-kan <github@wu-kan.cn>	2025-12-12 10:54:52 +00:00
Jaehwang Jung	f90319d5d1	[Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692 )	2025-12-12 02:27:20 -08:00
rasmith	302b2c1eb9	[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. (#30291 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-12 09:30:23 +00:00
Ben Browning	8f8fda261a	[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729 ) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-12-12 12:59:53 +08:00
Zhengxu Chen	fe1787107e	[compile] Parse compile range cache keys as Range during cache loading. (#30516 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-12 04:30:51 +00:00
Andreas Karatzas	783644e4ac	[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-12 03:54:56 +00:00
Ryan Rock	197473c4e7	[CI/Build] Use spawn subprocess for ROCm (#30272 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2025-12-12 03:33:17 +00:00
Nick Hill	947dfda9c2	[LMCache] Relax lmcache version requirement (#30425 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-11 18:18:47 -09:00
Michael Goin	9f2fc16a69	[Bugfix][Model] Fix Afmoe rope_parameters issue (#30505 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-12 02:53:57 +00:00
Bhanu Prakash Voutharoja	6a6fc41c79	gptq marlin quantization support for fused moe with lora (#30254 ) Signed-off-by: Bhanu068 <voutharoja.bhanu06@gmail.com>	2025-12-12 02:27:22 +00:00
Fadi Arafeh	f355ad5412	[CPU][FIX] Fix build failures on Arm CPUs with torch nightly (#30481 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-12-12 02:09:25 +00:00
Lucas Wilkinson	042da73244	[Core] Refactor `_build_attention_metadata` (#29628 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-11 17:54:12 -08:00
Andreas Karatzas	b5945d49c0	[ROCm][CI] Use mi325_4 agent pool for V1 e2e tests (#30526 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-12 01:37:24 +00:00
rasmith	ba80926681	[CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 (#30508 ) Signed-off-by: Randall Smith <ransmith@amd.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-12 01:02:19 +00:00
jiahanc	0ab23c2b2b	[fix] fix SM check for Flashinfer TRTLLM MOE (#30314 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-12-12 01:00:58 +00:00
rasmith	48661d275f	[CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm (#30417 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-12 00:24:20 +00:00
Ev Lacey	d527cf0b3d	[FIX]Patch run-cluster.sh (fix for #28328 ) (#30002 ) Signed-off-by: elacey <elacey@nvidia.com> Signed-off-by: Ev Lacey <github@everettlacey.com>	2025-12-11 23:36:31 +00:00
Concurrensee	2cc5affc38	[ROCM][CI] Fix AMD Examples Test Group (#30276 ) Signed-off-by: Yida Wu <yida.wu@amd.com> Signed-off-by: Yida <yida.wu@amd.com>	2025-12-11 18:03:54 -05:00
Andrew Briand	a00d88973d	[EPLB] Support EPLB w/ NVFP4 (#29804 ) Signed-off-by: Andrew Briand <abriand@nvidia.com> Co-authored-by: Andrew Briand <abriand@nvidia.com>	2025-12-11 22:59:40 +00:00
Wentao Ye	61249b177d	[Refactor] Remove useless syncwarp (#30510 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-11 17:43:41 -05:00
Wentao Ye	c817b14151	[Perf] Optimize deepgemm experts initialization, 3.9% TTFT improvement (#30494 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: li-jinpeng <3332126450@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-11 17:28:34 -05:00
ioana ghiban	3efdc3feae	[Docs][CPU backend] Add pre-built Arm CPU Docker images (#30491 ) Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>	2025-12-11 22:03:29 +00:00
Nicolò Lucchesi	0efd9f867c	[Core] Whisper Enable Encoder Batching (#29421 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-11 21:06:51 +00:00
Xingyu Liu	90d6cf921f	[BugFix][MM]support VLLM_RANDOMIZE_DP_DUMMY_INPUTS (#30472 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-11 21:00:15 +00:00
Harry Mellor	cf3eacfe58	Standardise `get_rope` to use `rope_parameters["partial_rotary_factor"]`, not `rotary_dim` (#30389 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 20:45:23 +00:00
Zhengxu Chen	92fea56fd1	[compile] Stop one-off setting enable_aot_compile and use context manager instead. (#30503 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-11 20:28:03 +00:00

1 2 3 4 5 ...

12213 Commits