xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-05 03:47:08 +08:00

Author	SHA1	Message	Date
Luka Govedič	3597b06a4f	[CUDA] Enable full cudagraph for FlashMLA (#18581 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-13 18:12:26 +00:00
qscqesze	a24cb91600	[Model] Fix minimax model cache & lm_head precision (#19592 ) Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-06-13 12:08:20 +00:00
Nick Hill	7e8d97dd3f	[BugFix] Honor `enable_caching` in connector-delayed kvcache load case (#19435 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-13 09:46:32 +00:00
youkaichao	d70bc7c029	[torch.compile] reorganize the cache directory to support compiling multiple models (#19064 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-06-13 15:23:25 +08:00
Boyuan Feng	ce688ad46e	use base version for version comparison (#19587 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-06-13 15:09:34 +08:00
汪志鹏	cefdb9962d	[Fix] The zip function in Python 3.9 does not have the strict argument (#19549 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-13 14:57:48 +08:00
Li, Jiang	6458721108	[CPU] Refine default config for the CPU backend (#19539 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-06-13 13:27:39 +08:00
Hyogeun Oh (오효근)	bb4a0decef	[Misc] Correct broken docs link (#19553 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-06-12 22:27:13 -07:00
qizixi	c68698b326	[Bugfix] Fix EAGLE vocab embedding for multimodal target model (#19570 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-06-12 23:09:19 -04:00
Varun Sundar Rabindranath	e3b12667d4	[BugFix] : Fix Batched DeepGemm Experts (#19515 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-12 20:43:02 -06:00
Russell Bryant	c57bb199b3	[V1] Resolve failed concurrent structured output requests (#19565 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-06-12 23:30:09 +00:00
Michael Goin	a3319f4f04	[Bugfix] Enforce contiguous input for dynamic_per_token FP8/INT8 quant (#19452 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-12 15:39:15 -04:00
Varun Sundar Rabindranath	9d880f594d	[Misc] Turn MOE_DP_CHUNK_SIZE into an env var (#19506 )	2025-06-12 18:01:16 +00:00
Ekagra Ranjan	017ef648e9	[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (#18847 )	2025-06-12 10:30:56 -07:00
Luka Govedič	f98548b9da	[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com>	2025-06-12 08:31:04 -07:00
mobicham	96846bb360	Fix TorchAOConfig skip layers (#19265 ) Signed-off-by: mobicham <hicham@mobiuslabs.com>	2025-06-12 22:22:53 +08:00
Nicolò Lucchesi	1129e2b1ab	[V1][NixlConnector] Drop `num_blocks` check (#19532 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-06-12 12:36:14 +00:00
Jee Jee Li	73e2e0118f	[Quantization] Improve AWQ logic (#19431 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-12 11:02:11 +00:00
jmswen	c9280e6346	[Bugfix] Respect num-gpu-blocks-override in v1 (#19503 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-12 11:00:23 +00:00
Michael Goin	af09b3f0a0	[Bugfix][V1] Allow manual FlashAttention for Blackwell (#19492 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-12 10:40:24 +00:00
rasmith	2e090bd5df	[AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm (#19509 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-06-12 07:14:24 +00:00
wonjun Jang	1b0b065eb5	[BugFix] Handle missing sep_token for Qwen3-Reranker in Score API (#19522 ) Signed-off-by: strutive07 <strutive07@gmail.com>	2025-06-12 07:00:47 +00:00
Nick Hill	d5bdf899e4	[BugFix] Work-around incremental detokenization edge case error (#19449 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-12 06:43:20 +00:00
22quinn	7e3e74c97c	[Frontend] Improve error message in tool_choice validation (#19239 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-12 01:13:00 -04:00
Brayden Zhong	3f6341bf7f	Add Triton Fused MoE kernel config for E=16 on B200 (#19518 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-06-12 04:31:51 +00:00
Varun Sundar Rabindranath	e5d35d62f5	[BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import (#19514 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-12 04:28:12 +00:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
Robert Shaw	97a9465bbc	[UX] Add Feedback During CUDAGraph Capture (#19501 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-06-11 21:09:05 +00:00
rasmith	c7ea0b56cd	[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-06-11 15:53:28 -04:00
bnellnm	29fa5cac1c	[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-06-11 12:53:10 -04:00
Jee Jee Li	04a55612dd	[Misc] Fix misleading ROCm warning (#19486 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-12 00:12:10 +08:00
Ximingwang-09	3c8694eabe	Fix some typo (#19475 ) Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com> Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-06-11 10:36:04 +00:00
Michael Goin	7484e1fce2	Add cache to cuda get_device_capability (#19436 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-11 17:37:05 +08:00
Cyrus Leung	a2142f0196	Support non-string values in JSON keys from CLI (#19471 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-11 09:34:04 +00:00
Lu Fang	871d6b7c74	[Misc] Reduce warning message introduced in env_override (#19476 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-11 17:29:54 +08:00
Cyrus Leung	68b4a26149	[Doc] Update V1 User Guide for Hardware and Models (#19474 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-11 00:49:06 -07:00
artetaout	b8e809a057	[Kernel] Support deep_gemm for linear methods (#19085 ) Signed-off-by: artetaout <lulala341@gmail.com>	2025-06-11 15:14:45 +08:00
Junhao Li	2d40665fe8	Add fused MOE config for Qwen3 30B A3B on B200 (#19455 ) Signed-off-by: Junhao Li <junhao@ubicloud.com>	2025-06-11 13:43:46 +08:00
Lukas Geiger	96ada386b7	[Misc] Remove unused `MultiModalHasher.hash_prompt_mm_data` (#19422 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-06-11 05:18:57 +00:00
wang.yuqi	3952731e8f	[New Model]: Support Qwen3 Embedding & Reranker (#19260 )	2025-06-10 20:07:30 -07:00
Richard Zou	77f0d465d0	[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-11 07:54:41 +08:00
Xu Wenqing	22c3c0aa4a	Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (#19401 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>	2025-06-11 07:23:57 +08:00
py-andy-c	33f8dba7c6	[Model] use AutoWeightsLoader for commandr (#19399 ) Signed-off-by: py-andy-c <pychen1017@gmail.com>	2025-06-10 22:42:21 +00:00
Gregory Shtrasberg	5241ca50d6	[ROCm][V1] Adding ROCm to the list of plaforms using V1 by default (#19440 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-06-10 22:06:15 +00:00
Jee Jee Li	b6553be1bc	[Misc] Slight improvement of the BNB (#19418 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-10 13:51:49 +00:00
Rachel Guo	467bef18a3	[BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword `use_irope` (#19134 ) Signed-off-by: Yunqiu Guo <guorachel@meta.com>	2025-06-10 16:48:51 +08:00
Isotr0py	5f1ac1e1d1	Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404 )	2025-06-10 01:30:20 -07:00
Louie Tsai	9368cc90b2	Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. (#17930 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-06-10 06:22:05 +00:00
Lukas Geiger	319cb1e351	[Core] Batch multi modal input using pinned memory (#19169 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-06-10 13:44:59 +08:00
Li Wang	1efef71645	[Bugfix] Fix modelscope token passed in (#19389 ) Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-10 13:39:37 +08:00

1 2 3 4 5 ...

4808 Commits