xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-25 11:11:21 +08:00

Author	SHA1	Message	Date
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Jee Jee Li	fde0b611a3	[Model] Decouple glm4v (#22751 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-12 17:13:17 -07:00
Harry Mellor	d0a6301588	Fix Transformers backend tensor parallel for multimodal models (#22673 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 17:12:30 -07:00
zifeitong	6534d2fc97	Fix torch version check for SM100 mxfp4 (#22535 ) Signed-off-by: Zifei Tong <zifeitong@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-12 12:54:42 -07:00
Nicolò Lucchesi	422f22e012	[CI][Nixl] Check kv cache layout during handshake (#22745 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-12 12:53:52 -07:00
Xiaozhu Meng	6bd8ebf026	[Kernel][AMD] Avoid D2H copy and cumsum kernel (#22683 ) Signed-off-by: Xiaozhu <mxz297@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-12 12:53:36 -07:00
Rahul Tuli	5a4b4b3729	Add: `SupportsEagle3` interface for explicit EAGLE3 support (#22642 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-08-12 09:24:52 -07:00
Po-Han Huang (NVIDIA)	67c153b88a	Fix Llama4 FlashInfer FP4 MoE issues (#22511 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-08-12 05:50:59 -07:00
wang.yuqi	f7ad6a1eb3	[CI Failure] fix tests/entrypoints/openai/test_skip_tokenizer.py (#22708 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-12 05:42:58 -07:00
Harry Mellor	80bb1e8afe	Officially support SmolLM3 using the Transformers backend (#22665 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 05:38:48 -07:00
Nicolò Lucchesi	d030b01548	[BugFix][Nixl][PD] Fix heterogenous TP (#22663 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-08-12 05:37:30 -07:00
Yongye Zhu	007dd90859	[gpt-oss] Enable gpt-oss on ampere (#22714 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-08-12 03:21:44 -07:00
RishiAstra	46ae7f6666	[Bugfix] Mamba2 SSD varlen bug fix initstates decay, improve test, assert chunk pwr 2 (#21783 ) Signed-off-by: Rishi Astra <40644327+RishiAstra@users.noreply.github.com>	2025-08-12 02:04:37 -07:00
Jun-Howie	1ece7f30ba	Fix: AWQ Marlin get_quant_method does not recognize "modules_to_not_convert" (#21888 ) Signed-off-by: JunHowie <JunHowie@aliyun.com> Co-authored-by: JunHowie <JunHowie@aliyun.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-12 02:03:53 -07:00
Sugar-zsg	8d17fa633e	[V0] Correct CUDA Graph capture for encoder-decoder models (#22630 )	2025-08-12 02:01:08 -07:00
dongluw	9f909b8996	[New Model] Support Command-A-Vision (#22660 ) Signed-off-by: donglu <donglu@cohere.com>	2025-08-12 01:39:54 -07:00
Harry Mellor	78077d5417	Move `SchedulerConfig` from `config/__init__.py` to `config/scheduler.py` (#22626 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 00:23:49 -07:00
wang.yuqi	6d729c43fb	[Bugfix] Fix ModernBert load & Enable sliding window attention for bidirectional attention. (#22637 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-12 00:23:17 -07:00
Harry Mellor	4fbd8bb597	Fix passing `SpeculativeConfig` from the CLI (#22652 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 22:13:32 -07:00
Chen Zhang	ad344ef552	[gpt-oss] Small bug fixes for frontend (#22512 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-11 22:04:38 -07:00
Chen Zhang	bbaf9e9cb1	[gpt-oss] Fix mxfp4 support (#22700 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-11 21:22:26 -07:00
Benji Beck	4678503476	Migrate MiniCPMVImageInputs to TensorSchema (#21939 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-08-11 20:43:37 -07:00
Andy Chen	9b94d6ec8f	Enable 4bit bnb prequant MOE (#21548 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-11 19:02:14 -07:00
Chen Zhang	95a935fc48	[gpt-oss] Support streaming in response API (#22431 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-11 17:46:59 -07:00
Harry Mellor	458e74eb90	Support more parallel styles in Transformers backend TP (#22651 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 10:42:48 -07:00
22quinn	807d21b80d	[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-11 10:31:36 -07:00
wang.yuqi	84cf78acee	[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-11 09:41:37 -07:00
GuanLuo	16fb668b61	fix: NIXL connector transfers partial block to pass full multi-modal context (#21074 ) Signed-off-by: GuanLuo <gluo@nvidia.com>	2025-08-11 09:40:55 -07:00
Wentao Ye	f7dcce7a4a	[Feature] Add `VLLM_USE_DEEP_GEMM_E8M0` Env to Control E8M0 Scale (#21968 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 09:39:08 -07:00
Isotr0py	8e13d9fe6d	[Misc] Further clean up some redundant config definitions (#22649 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-11 09:22:25 -07:00
danielafrimi	14a5d903ab	[Model] NemotronH Support (#22349 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>	2025-08-11 04:09:24 -07:00
Cyrus Leung	951b038298	[Misc] Move jsontree to utils (#22622 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-11 03:49:32 -07:00
Harry Mellor	bc1d02ac85	[Docs] Add comprehensive CLI reference for all large `vllm` subcommands (#22601 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 00:13:33 -07:00
JartX	1e55dfa7e5	[BUGFIX] KeyError 'layers.14.mlp.gate.g_idx' for Qwen3-MoE with GPTQ on ROCm (#22017 )	2025-08-11 00:13:30 -07:00
Maximilien de Bayser	39052dbca8	Support token_type_ids in V1 with less code changes (#21985 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-10 22:54:59 -07:00
vllmellm	9c97a1c349	[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. (#22521 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-08-10 22:52:34 -07:00
Eugene Cheah	f919d4cb8f	[BugFix] Fix logits repetition penalty cuda check (#22592 )	2025-08-10 22:52:31 -07:00
Zhewen Li	afa5b7ca0b	[Misc][gpt-oss] guard import when triton kernel when not up to date (#22584 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-08-10 21:29:35 -07:00
Nick Hill	5898b135ab	[BugFix] Fix KVConnectorOutput TPU breakage (#22598 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-10 19:33:48 -07:00
Benji Beck	06da44f0cb	Migrate LlavaImageInputs to TensorSchema (#21770 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-08-10 19:29:19 -07:00
Benji Beck	a554991748	Migrate LlavaNextVideoPixelInputs to TensorSchema (#21843 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-08-10 19:29:16 -07:00
Doug Smith	d1af8b7be9	enable Docker-aware precompiled wheel setup (#22106 ) Signed-off-by: dougbtv <dosmith@redhat.com>	2025-08-10 16:29:02 -07:00
ZiTian Zhao	8c50d62f5a	Remove redundant row_indices unsqueeze operation in MiniCPMO (#22528 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>	2025-08-10 09:20:00 -07:00
Benji Beck	b4e2916721	Migrate LlavaNextImageInputs to TensorSchema (#21774 ) Signed-off-by: Benji Beck <benjibeck@meta.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-10 09:05:21 -07:00
Breno Baldas Skuk	65a7917be4	Fix(benchmarks): allow multiple mm contents in OpenAI Chat Completion Benchmarks (#22534 ) Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>	2025-08-10 09:03:15 -07:00
Isotr0py	b76753f0b5	[Bugfix][Kernel] Support partial rotary embedding for MRoPE triton kernel (#22593 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-10 09:00:36 -07:00
Harry Mellor	8290d15d2c	Move `CacheConfig` from `config/__init__.py` to `config/cache.py` (#22586 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-10 07:36:40 -07:00
Harry Mellor	00976db0c3	[Docs] Fix warnings in docs build (#22588 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-10 05:49:51 -07:00
Cyrus Leung	d411df0296	[Misc] Further refine type annotations in parallel state (#22499 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-10 05:49:48 -07:00
Isotr0py	7e8d685775	[Minor] Fix pre-commit error on main (#22579 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-10 00:08:23 -07:00

1 2 3 4 5 ...

5800 Commits