xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-16 16:57:21 +08:00

Author	SHA1	Message	Date
Wentao Ye	c1acd6d7d4	[Refactor] Change the way of import triton (#20774 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-12 19:39:55 -07:00
ElizaWszola	3b3b778d4a	[Bugfix] Fix a couple PPLX+CUTLASS MoE bugs (#20825 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-12 19:39:14 -07:00
Wentao Ye	42d440c22b	[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-12 19:38:45 -07:00
Woosuk Kwon	f45a332886	[Sched] Enhance the logic to remove stopped requests from queues (#20739 )	2025-07-12 15:33:13 -07:00
Michael Goin	6e2c176e1f	[Bugfix] Restrict Machete to only run on Hopper (#20830 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-12 17:34:40 +00:00
Reid	a86754a12b	[docs] convert supported configs to table (#20858 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-12 06:54:50 -07:00
Alex Brooks	c2a2f19aba	[Bugfix] Fix Tensor Parallelism Padding Consistency in Granite Models (#20843 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-07-12 06:11:30 -07:00
Congcong Chen	2c11a738b3	[Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702 ) Signed-off-by: Congcong Chen <congcongchen@microsoft.com>	2025-07-12 06:02:10 -07:00
Michael Goin	b639327ad9	Revert "Use NVCC --compress-mode to reduce binary size by 30% #20694 " (#20853 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-11 23:07:35 -07:00
Zhiyu	4afe687a82	Enable ModelOpt Llama4 fp8 checkpoint deployment (#20419 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-07-11 23:07:16 -07:00
Maximilien de Bayser	5de8d9f111	Remove extra tensor on CPU (#20693 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-07-12 14:06:34 +08:00
Boyuan Feng	c1c8ca57ff	[cold start time] add envs.VLLM_COMPILE_DEPYF to guard decompile (#20790 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-07-11 23:06:13 -07:00
Richard Zou	a3a5a47e48	[Bugfix] Fix torch.compile x LoRA for PyTorch 2.8 (#20823 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-07-11 23:06:04 -07:00
Lucia Fang	fb25e95688	[Docs] Update basic.md (#20846 )	2025-07-11 23:05:32 -07:00
Wentao Ye	0d4891cd03	[Bug] Fix DeepGemm for EP low latency case (#20833 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-11 23:05:12 -07:00
lkchen	f56d2996ca	[Misc] Respect `no_use_tqdm_on_load` flag while capturing CUDA graph (#20834 ) Signed-off-by: Linkun <github@lkchen.net>	2025-07-11 23:04:45 -07:00
Isotr0py	147afb448b	[Bugfix] Replace unavailable video url in multimodal test (#20854 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-12 05:25:39 +00:00
Nicolò Lucchesi	3c7d942da8	[Frontend] Abstract prompt and SpeechToTextConfig for transcriptions models (#20637 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-11 21:33:26 -07:00
Varun Sundar Rabindranath	890323dc1b	[Bugfix] : Fix typo - logger.warn_once -> logger.warning_once (#20852 )	2025-07-11 20:56:24 -07:00
Isotr0py	01cae37713	[CI/Build] Ensure compatability with Transformers v4.53 (#20541 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-11 20:53:07 -07:00
yurhett	11c0198615	[Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading (#20682 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-07-11 20:52:43 -07:00
Li, Jiang	b1235c3e10	[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices (#20822 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-11 20:52:05 -07:00
Jee Jee Li	44d02f54db	[Misc] Restrict deep_gemm's log output (#20827 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-11 20:50:42 -07:00
Trevor Morris	a8593237c0	Add pynccl all-gatherv and reducescatterv (#20154 ) Signed-off-by: Trevor Morris <tmorris@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 18:59:23 -07:00
Ilya Markov	fc0f41d10a	Integration SM100 FlashInfer fused allreduce RMSNorm (#20691 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-07-11 18:58:15 -07:00
Wentao Ye	7b828e30d5	[CI Bug] Fix Async Engine, Inputs, Utils, Worker Test: 'State' object has no attribute 'enable_server_load_tracking' (#20845 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-11 18:57:24 -07:00
bigmoyan	5f0af36af5	Update kimi-k2 tool calling docs, enable unit tests (#20821 ) Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@msh.team>	2025-07-11 20:16:14 +00:00
Isotr0py	0d21b2664c	[Bugfix] Fix OOM in language generation test (#20814 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-11 11:21:52 -07:00
Nick Hill	9907fc4494	[Docs] Data Parallel deployment documentation (#20768 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-11 09:42:10 -07:00
Michael Goin	d47661f0cd	[Kernel] Basic tuned configs for NVFP4 CUTLASS dense GEMM (#20646 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-11 10:05:33 -06:00
Varun Sundar Rabindranath	53fa457391	[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-11 07:51:46 -07:00
Reid	6fb162447b	[doc] fix ordered list issue (#20819 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-11 06:49:46 -07:00
Li, Jiang	66177189c5	[Bugfix] Add missing field to TritonLanguagePlaceholder (#20812 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-11 05:25:11 -07:00
QiliangCui	b4f0b5f9aa	Temporarily suspend google/gemma-3-1b-it. (#20722 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-11 11:21:26 +00:00
Cyrus Leung	cbd14ed561	[Bugfix] Refactor `/invocations` to be task-agnostic (#20764 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-11 03:20:54 -07:00
Pavani Majety	7bd4c37ae7	[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: shuw <shuw@nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 09:23:23 +00:00
Jee Jee Li	8020e98c9f	[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-11 08:01:13 +00:00
Luka Govedič	762be26a8e	[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Signed-off-by: luka <lgovedic@redhat.com>	2025-07-11 00:15:22 -07:00
Reid	6a9e6b2abf	[doc] fold long code block (#20795 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-10 23:16:41 -07:00
nopperl	5d09152ff1	[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine (#20660 ) Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>	2025-07-11 05:53:31 +00:00
Luka Govedič	31d5c1797f	[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 04:56:28 +00:00
Ratnam Parikh	35514b682a	[XPU] XCCL support enabled in torch 2.8.0.dev nightly builds (#20705 ) Signed-off-by: ratnampa <ratnam.parikh@intel.com>	2025-07-10 20:39:52 -07:00
Wentao Ye	e2de455c34	[Feature] Integrate SM100 DeepGEMM support (#20087 )	2025-07-10 20:18:05 -07:00
Alexander Matveev	5b032352cc	[Attention] MLA - Flashinfer Ragged Prefill (#20034 )	2025-07-10 20:17:47 -07:00
Michael Goin	922f316441	[Model] Support HF format of minimax (#20211 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-11 02:55:21 +00:00
Duncan Moss	5923ab9524	[fix]: disable cutlass block scaled group gemm for EP (#20781 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com>	2025-07-11 02:39:18 +00:00
bigmoyan	0cf893cae1	Add kimi-k2 tool parser (#20789 ) Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@msh.team>	2025-07-11 10:36:23 +08:00
Michael Goin	cf75cd2098	[CI Bugfix] Specify same TORCH_CUDA_ARCH_LIST for flashinfer aot and install (#20772 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-11 01:16:01 +00:00
Simon Mo	b854321ffe	[Docs] Lazy import gguf (#20785 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-07-10 16:06:37 -07:00
Kuntai Du	5b6fe23d05	[Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. (#20786 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-10 14:52:46 -07:00

1 2 3 4 5 ...

7666 Commits