xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-24 16:37:22 +08:00

Author	SHA1	Message	Date
Duncan Moss	5923ab9524	[fix]: disable cutlass block scaled group gemm for EP (#20781 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com>	2025-07-11 02:39:18 +00:00
bigmoyan	0cf893cae1	Add kimi-k2 tool parser (#20789 ) Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@msh.team>	2025-07-11 10:36:23 +08:00
Michael Goin	cf75cd2098	[CI Bugfix] Specify same TORCH_CUDA_ARCH_LIST for flashinfer aot and install (#20772 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-11 01:16:01 +00:00
Simon Mo	b854321ffe	[Docs] Lazy import gguf (#20785 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-07-10 16:06:37 -07:00
Kuntai Du	5b6fe23d05	[Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. (#20786 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-10 14:52:46 -07:00
Varun Sundar Rabindranath	f0c98cae27	[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-10 14:40:38 -07:00
Nick Hill	574ad60db9	[KVConnector] Always call connector `clear_metadata()` at end of step (#20756 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: David Ben-David <sdavidbd@gmail.com>	2025-07-10 22:37:27 +01:00
Varun Sundar Rabindranath	fdadb6f43a	[Bugfix] Fused MoE Modular Kernel chunking loop (#20392 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-10 20:31:10 +00:00
Alex Brooks	41060c6e08	[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] (#19126 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-07-10 21:09:37 +01:00
Ming Yang	3de2ed767f	[Bugfix] Remove assertion of expert_map being None (#20714 ) Signed-off-by: Ming Yang <yming@meta.com> Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-07-10 19:55:22 +00:00
Wentao Ye	299252ea82	[CI] Fix pre commit issue (#20782 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-10 12:48:13 -07:00
Nathan Hoos	d6902ce79f	[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975 ) Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>	2025-07-10 15:30:26 -04:00
Sanger Steel	5e53c89a74	[Bugfix] [CI] Fix Tensorizer LoRA test (#20760 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-07-10 19:07:06 +00:00
QiliangCui	c66e38ea4c	[Test] Remove docker build from test. (#20542 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-10 11:21:58 -07:00
sfbemerk	251595368f	Fix DeepSeek-R1-0528 chat template (#20717 ) Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com> Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>	2025-07-10 17:47:36 +00:00
shineran96	4bed167768	[Model][VLM] Support JinaVL Reranker (#20260 ) Signed-off-by: shineran96 <shinewang96@gmail.com>	2025-07-10 10:43:43 -07:00
Asher	b140416abf	[Model] Add reason parser for Hunyuan A13B Model. (#20625 ) Signed-off-by: Asher Zhang <asherszhang@tencent.com>	2025-07-10 16:33:26 +00:00
Gregory Shtrasberg	5b8366b61a	[ROCm][Regression] Remove tensor creation that harms performance on ROCm (#20741 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-07-10 09:22:23 -07:00
nishith-fujitsu	c7753a9809	[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (#14129 ) Signed-off-by: nishith-fujitsu <nishith.jaiswal@fujitsu.com>	2025-07-10 15:59:04 +00:00
Michael Goin	4b9a9435bb	Update Dockerfile FlashInfer to v0.2.8rc1 (#20718 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-10 08:09:02 -07:00
Harry Mellor	3482fd7e4e	[Doc] Add engine args back in to the docs (#20674 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-10 08:02:40 -07:00
Isotr0py	77f77a951e	[Misc] Clean up mark to fork process in BNB tests (#20692 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-10 13:59:40 +00:00
Michael Goin	1a4f35e2ea	Normalize lm-eval command between baseline and correctness test (#18560 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-10 13:27:32 +00:00
Michael Goin	be1e128dfb	[CI Bugfix] Skip failing Tensorizer+LoRA test (#20724 )	2025-07-10 21:15:03 +09:00
Reid	65393ee064	[doc] fix ordered list (#20749 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-10 03:13:52 -07:00
Gregory Shtrasberg	dc221ad72d	[Bugfix][Build][Non-CUDA] Only referencing CMAKE_CUDA_COMPILER_VERSION on CUDA where it is defined (#20738 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-07-10 02:58:11 -07:00
Jee Jee Li	7571a4a7e5	[CI/Build] Fix Basic Models Test (#20728 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-10 09:57:19 +00:00
Isotr0py	f67d986dd1	[Misc] loose new-model tagger conditions (#20747 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-10 02:54:47 -07:00
Or Ozeri	cc876d0f29	[KVConnector] Aggregate finished requests on the scheduler (#19555 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-07-10 09:22:18 +01:00
Chenyaaang	fdfd409f8f	[TPU][Core]Make load weight exceed hbm error more instructive for customers (#20644 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-07-10 07:01:17 +00:00
Nick Hill	ffbcc9e757	[BugFix] Fix `VllmConfig()` construction on all platforms (#20695 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-10 07:00:20 +00:00
Nick Hill	59389c927b	[BugFix][CPU] Fix CPU worker dependency on cumem_allocator (#20696 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-10 14:24:20 +08:00
Chauncey	8f2720def9	[Frontend] Support Tool Calling with both `tool_choice='required'` and `$defs`. (#20629 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-10 13:56:35 +08:00
Seiji Eicher	ad6c2e1a0b	Correct PPMissingLayer handling in Deepseek-V2-Lite PP deployment (#20665 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-07-09 20:34:40 -07:00
Michael Goin	49e8c7ea25	Use NVCC `--compress-mode` to reduce binary size by 30% (#20694 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-09 18:26:48 -07:00
Varun Sundar Rabindranath	805d62ca88	[Misc] DP : Add ExpertTokensMetadata (#20332 ) Signed-off-by: Varun <vsundarr@redhat.com> Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-07-10 00:33:14 +00:00
Michael Goin	b7d9e9416f	[CI/Build] Fix FlashInfer double build in Dockerfile (#20651 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-09 17:41:56 -06:00
Woosuk Kwon	7c12a765aa	[Misc] Simplify the prefix caching logic on draft tokens (#20701 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-09 14:48:35 -07:00
Yiming	cd587c93ef	[BugFix]: Properly set engine_id when using multi connector (#19487 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: leiyiming <leiyiming@kingsoft.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-07-09 20:32:44 +00:00
fxmarty-amd	332d4cb17b	[Feature][Quantization] MXFP4 support for MOE models (#17888 ) Signed-off-by: Felix Marty <felmarty@amd.com> Signed-off-by: Bowen Bao <bowenbao@amd.com> Signed-off-by: Felix Marty <Felix.Marty@amd.com> Co-authored-by: Bowen Bao <bowenbao@amd.com>	2025-07-09 13:19:02 -07:00
Jacob Manning	bf03ff3575	[Kernel] Add Conch backend for mixed-precision linear layer (#19818 ) Signed-off-by: Jacob Manning <jmanning+oss@stackav.com>	2025-07-09 13:17:55 -07:00
Tuan, Hoang-Trong	47043eb678	[Kernel] Triton implementation of causal-conv1d for Mamba-based models (#18218 ) Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com> Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-07-09 12:53:55 -07:00
Michael Goin	31b96d1c64	Support Llama 4 for cutlass_moe_fp4 (#20453 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-09 15:53:38 -04:00
Li, Jiang	e59ba9e142	[CI/Build] Enlarge tolerance for a CPU multi-modal test (#20684 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-09 17:48:52 +00:00
Harry Mellor	403b481573	Remove heading form installation `inc.md` file (#20697 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-09 10:42:51 -07:00
Li, Jiang	138709f8d1	[Doc] Update CPU doc (#20676 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-09 10:28:30 -07:00
Michael Goin	0bbac1c1b4	[Bench] Add NVFP4 GEMM benchmark script (#20578 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-09 13:23:48 -04:00
Liangliang Ma	a3e4e85ece	[XPU][CI] enhance xpu test support (#20652 ) Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com> Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>	2025-07-09 16:53:09 +00:00
Chengji Yao	eb58f5953d	[TPU][Bugfix] fix test_pallas (#20666 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-09 09:32:48 -07:00
Sanger Steel	4ac9c33f78	[Bugfix] Fix handling of Tensorizer arguments for LoadConfig (#20643 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-07-09 15:36:37 +00:00

1 2 3 4 5 ...

7621 Commits