xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-03 08:17:09 +08:00

Author	SHA1	Message	Date
zzh142857	d16aa3dae4	[Model] Add option to run Step3VisionEncoder in DP (#22697 ) Signed-off-by: zzh142857 <chaorenzhaozhenghao@gmail.com>	2025-08-13 00:09:13 -07:00
Chen Zhang	6807af8f46	[gpt-oss] upgrade gpt-oss to v0.0.3 and add version check (#22768 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-12 21:37:26 -07:00
shixianc	4c558cf62e	[Perf] Support topk softmax fused kernel for broader num_experts (#22211 ) Signed-off-by: Shixian Cui <shixian@amazon.com> Co-authored-by: Shixian Cui <shixian@amazon.com>	2025-08-12 21:34:47 -07:00
Wentao Ye	77a6bf07ae	[Bug] Fix Unexpected Keyword Argument 'w1_bias' (#22757 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-12 21:31:47 -07:00
Michael Goin	4082338a25	Remove unneeded ROCm platform import when using CUDA (#22765 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-12 21:26:38 -07:00
Michael Goin	c6b928798e	Force TRTLLM attention for gpt-oss on SM100 (#22678 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-12 21:22:16 -07:00
Michael Goin	b1361c7273	[Bugfix] Fix default enable for CUTLASS MLA on SM100 (#22738 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-12 21:22:05 -07:00
Po-Han Huang (NVIDIA)	4f0f844b16	Fix cuda illegal mem access with Llama4 TP8 + rms_norm custom op (#22701 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-08-12 21:21:50 -07:00
Woosuk Kwon	c5830381af	[V0 Deprecation] Remove args for multi-step scheduling (#22779 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:38:18 -07:00
Woosuk Kwon	d31f97cf57	[Misc] Remove tests/multi_step/__init__.py (#22778 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:21:18 -07:00
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Michael Goin	e18859298d	Add hardware plugins to installation doc (#22732 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 17:14:46 -07:00
Jee Jee Li	fde0b611a3	[Model] Decouple glm4v (#22751 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-12 17:13:17 -07:00
Harry Mellor	d0a6301588	Fix Transformers backend tensor parallel for multimodal models (#22673 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 17:12:30 -07:00
Harry Mellor	45c3936e94	[Docs] Hide the navigation and toc sidebars on home page (#22749 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 17:12:26 -07:00
Frank Wang	ba81acbdc1	[Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues (#22606 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com>	2025-08-12 15:43:06 -07:00
RUTHLESS-BOT	53c730286c	[Misc] parametrize 'dtype' in test_flash_mla (#22641 ) Signed-off-by: RUTHLESS-BOT <wujiafeng@cmbchina.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-12 16:31:48 -04:00
zifeitong	6534d2fc97	Fix torch version check for SM100 mxfp4 (#22535 ) Signed-off-by: Zifei Tong <zifeitong@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-12 12:54:42 -07:00
Nicolò Lucchesi	422f22e012	[CI][Nixl] Check kv cache layout during handshake (#22745 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-12 12:53:52 -07:00
Xiaozhu Meng	6bd8ebf026	[Kernel][AMD] Avoid D2H copy and cumsum kernel (#22683 ) Signed-off-by: Xiaozhu <mxz297@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-12 12:53:36 -07:00
Wentao Ye	dab4f9f764	[Chore] Update CODEOWNERS to include @yewentao256 for CUDA kernels, attention backends, quantization, and related tests (#22741 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-13 00:50:31 +08:00
TeeKen Lau	c42fe0b63a	Add more test scenario for tensor schema (#22733 ) Signed-off-by: teekenl <teekenlau@gmail.com>	2025-08-12 16:34:41 +00:00
Rahul Tuli	5a4b4b3729	Add: `SupportsEagle3` interface for explicit EAGLE3 support (#22642 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-08-12 09:24:52 -07:00
Daniel Serebrenik	e5d3d63c42	[Benchmark] Fix terminal colors in benchmark_serving_multi_turn (python 3.12) (#22730 ) Signed-off-by: daniels <daniels@pliops.com>	2025-08-12 14:41:37 +00:00
Nicolò Lucchesi	3d9d40efde	[Bugfix][CI] Fix `test_remote_decode_lifecycle.py::test_short_prompt_lifecycle` (#22727 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-12 07:30:17 -07:00
Po-Han Huang (NVIDIA)	67c153b88a	Fix Llama4 FlashInfer FP4 MoE issues (#22511 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-08-12 05:50:59 -07:00
wang.yuqi	f7ad6a1eb3	[CI Failure] fix tests/entrypoints/openai/test_skip_tokenizer.py (#22708 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-12 05:42:58 -07:00
Harry Mellor	80bb1e8afe	Officially support SmolLM3 using the Transformers backend (#22665 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 05:38:48 -07:00
Nicolò Lucchesi	d030b01548	[BugFix][Nixl][PD] Fix heterogenous TP (#22663 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-08-12 05:37:30 -07:00
Harry Mellor	767e63b860	[Docs] Improve docs navigation (#22720 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 04:25:55 -07:00
Yongye Zhu	007dd90859	[gpt-oss] Enable gpt-oss on ampere (#22714 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-08-12 03:21:44 -07:00
Jee Jee Li	b8a9d0e429	[Misc] remove GH discussions link (#22722 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-12 03:15:33 -07:00
zejunchen-zejun	50f2aae1b4	[LMCache][Example] Align the PYTHONHASHSEED for prefillers and decoders for KV chunks hashing (#21161 ) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>	2025-08-12 02:05:14 -07:00
RishiAstra	46ae7f6666	[Bugfix] Mamba2 SSD varlen bug fix initstates decay, improve test, assert chunk pwr 2 (#21783 ) Signed-off-by: Rishi Astra <40644327+RishiAstra@users.noreply.github.com>	2025-08-12 02:04:37 -07:00
Jun-Howie	1ece7f30ba	Fix: AWQ Marlin get_quant_method does not recognize "modules_to_not_convert" (#21888 ) Signed-off-by: JunHowie <JunHowie@aliyun.com> Co-authored-by: JunHowie <JunHowie@aliyun.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-12 02:03:53 -07:00
phantomlei	bc8372efc3	[Bugfix] Fix erroneous randomly generated cases in bad word testing (#22170 ) Signed-off-by: phantomlei <phantomlei3@gmail.com>	2025-08-12 02:03:22 -07:00
Sugar-zsg	8d17fa633e	[V0] Correct CUDA Graph capture for encoder-decoder models (#22630 )	2025-08-12 02:01:08 -07:00
dongluw	9f909b8996	[New Model] Support Command-A-Vision (#22660 ) Signed-off-by: donglu <donglu@cohere.com>	2025-08-12 01:39:54 -07:00
Chendi.Xue	59f3b93636	[DOC] update v1_guide with INTEL HW (#22679 ) Signed-off-by: Chendi.Xue <chendi.xue@intel.com>	2025-08-12 01:22:49 -07:00
Harry Mellor	78077d5417	Move `SchedulerConfig` from `config/__init__.py` to `config/scheduler.py` (#22626 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 00:23:49 -07:00
wang.yuqi	6d729c43fb	[Bugfix] Fix ModernBert load & Enable sliding window attention for bidirectional attention. (#22637 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-12 00:23:17 -07:00
Sooraj S	2f4657952b	[doc] Update x86 CPU-inference installation doc to reflect optionality of AVX512f (#22707 ) Signed-off-by: Sooraj S <94284954+sooraj-satheesh@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-08-12 00:21:08 -07:00
Hongsheng Liu	3a7e3bbdd2	[Doc] Added unmentioned required option "method" in the usage of EAGLE-3 based models (#21737 ) Signed-off-by: Dilute-l <dilu2333@163.com> Co-authored-by: Dilute-l <dilu2333@163.com>	2025-08-12 00:14:51 -07:00
Harry Mellor	4fbd8bb597	Fix passing `SpeculativeConfig` from the CLI (#22652 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 22:13:32 -07:00
Chen Zhang	ad344ef552	[gpt-oss] Small bug fixes for frontend (#22512 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-11 22:04:38 -07:00
Chen Zhang	bbaf9e9cb1	[gpt-oss] Fix mxfp4 support (#22700 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-11 21:22:26 -07:00
Benji Beck	4678503476	Migrate MiniCPMVImageInputs to TensorSchema (#21939 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-08-11 20:43:37 -07:00
Michael Goin	93d0652433	[CI] Increase timeout for test_completion_with_image_embeds (#22670 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-11 20:31:36 -07:00
Michael Goin	ea1292ad3e	[CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py (#22686 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-11 20:20:42 -07:00
Po-Han Huang (NVIDIA)	dc5e4a653c	Upgrade FlashInfer to v0.2.11 (#22613 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-11 19:58:41 -07:00

1 2 3 4 5 ...

8527 Commits