xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-24 14:27:18 +08:00

Author	SHA1	Message	Date
Reid	6fa718a460	[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-14 16:54:52 +08:00
Lu Fang	06be858828	[Bugfix] Fix the speculative decoding test by setting the target dtype (#19633 )	2025-06-13 20:57:32 -07:00
Saheli Bhattacharjee	d1e34cc9ac	[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. (#18354 ) Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>	2025-06-14 11:07:36 +08:00
Nick Hill	bd517eb9fe	[BugFix] Fix DP Coordinator incorrect debug log message (#19624 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-14 00:18:03 +00:00
Concurrensee	d65668b4e8	Adding "AMD: Multi-step Tests" to amdproduction. (#19508 ) Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-06-13 17:08:51 -07:00
Woosuk Kwon	aafbbd981f	[torch.compile] Use custom ops when use_inductor=False (#19618 )	2025-06-13 15:05:54 -07:00
Anna Pendleton	0f0874515a	[Doc] Add troubleshooting section to k8s deployment (#19377 ) Signed-off-by: Anna Pendleton <pendleton@google.com>	2025-06-13 21:47:51 +00:00
Luka Govedič	3597b06a4f	[CUDA] Enable full cudagraph for FlashMLA (#18581 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-13 18:12:26 +00:00
Reid	1015296b79	[doc][mkdocs] fix the duplicate Supported features sections in GPU docs (#19606 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-13 16:25:08 +00:00
Wentao Ye	ce9dc02c93	[Refactor] Remove unused variables in `moe_permute_unpermute_kernel.inl` (#19573 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-13 06:12:15 -07:00
qscqesze	a24cb91600	[Model] Fix minimax model cache & lm_head precision (#19592 ) Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-06-13 12:08:20 +00:00
Nick Hill	7e8d97dd3f	[BugFix] Honor `enable_caching` in connector-delayed kvcache load case (#19435 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-13 09:46:32 +00:00
youkaichao	d70bc7c029	[torch.compile] reorganize the cache directory to support compiling multiple models (#19064 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-06-13 15:23:25 +08:00
Boyuan Feng	ce688ad46e	use base version for version comparison (#19587 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-06-13 15:09:34 +08:00
汪志鹏	cefdb9962d	[Fix] The zip function in Python 3.9 does not have the strict argument (#19549 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-13 14:57:48 +08:00
汪志鹏	ace5cdaff0	[Fix] bump mistral common to support magistral (#19533 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-12 22:28:12 -07:00
Li, Jiang	6458721108	[CPU] Refine default config for the CPU backend (#19539 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-06-13 13:27:39 +08:00
Hyogeun Oh (오효근)	bb4a0decef	[Misc] Correct broken docs link (#19553 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-06-12 22:27:13 -07:00
Reid	c707cfc12e	[doc] fix incorrect link (#19586 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-13 04:26:09 +00:00
Aaron Pham	7b3c9ff91d	[Doc] uses absolute links for structured outputs (#19582 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-06-13 03:35:17 +00:00
qizixi	c68698b326	[Bugfix] Fix EAGLE vocab embedding for multimodal target model (#19570 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-06-12 23:09:19 -04:00
Varun Sundar Rabindranath	e3b12667d4	[BugFix] : Fix Batched DeepGemm Experts (#19515 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-12 20:43:02 -06:00
kourosh hakhamaneshi	e6aab5de29	Revert "[Build/CI] Add tracing deps to vllm container image (#15224 )" (#19378 )	2025-06-12 17:26:40 -07:00
Russell Bryant	c57bb199b3	[V1] Resolve failed concurrent structured output requests (#19565 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-06-12 23:30:09 +00:00
Aaron Pham	dba68f9159	[Doc] Unify structured outputs examples (#18196 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-06-12 22:50:31 +00:00
Michael Goin	a3319f4f04	[Bugfix] Enforce contiguous input for dynamic_per_token FP8/INT8 quant (#19452 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-12 15:39:15 -04:00
Varun Sundar Rabindranath	9d880f594d	[Misc] Turn MOE_DP_CHUNK_SIZE into an env var (#19506 )	2025-06-12 18:01:16 +00:00
Ekagra Ranjan	017ef648e9	[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (#18847 )	2025-06-12 10:30:56 -07:00
Reid	4b25ab14e2	[doc] Make top navigation sticky (#19540 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-12 15:48:11 +00:00
Luka Govedič	f98548b9da	[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com>	2025-06-12 08:31:04 -07:00
mobicham	96846bb360	Fix TorchAOConfig skip layers (#19265 ) Signed-off-by: mobicham <hicham@mobiuslabs.com>	2025-06-12 22:22:53 +08:00
Wentao Ye	b6efafd9e4	[Perf] Vectorize static / dynamic INT8 quant kernels (#19233 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-12 06:51:41 -07:00
Nicolò Lucchesi	1129e2b1ab	[V1][NixlConnector] Drop `num_blocks` check (#19532 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-06-12 12:36:14 +00:00
Cyrus Leung	c742438f8b	[Doc] Add V1 column to supported models list (#19523 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-12 19:16:44 +08:00
Jee Jee Li	73e2e0118f	[Quantization] Improve AWQ logic (#19431 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-12 11:02:11 +00:00
jmswen	c9280e6346	[Bugfix] Respect num-gpu-blocks-override in v1 (#19503 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-12 11:00:23 +00:00
Michael Goin	af09b3f0a0	[Bugfix][V1] Allow manual FlashAttention for Blackwell (#19492 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-12 10:40:24 +00:00
Russell Bryant	4f6c42fa0a	[Security] Prevent new imports of (cloud)pickle (#18018 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>	2025-06-12 10:30:17 +00:00
niu_he	dff680001d	Fix typo (#19525 ) Signed-off-by: 2niuhe <carlton2tang@gmail.com>	2025-06-12 09:24:45 +00:00
rasmith	2e090bd5df	[AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm (#19509 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-06-12 07:14:24 +00:00
wonjun Jang	1b0b065eb5	[BugFix] Handle missing sep_token for Qwen3-Reranker in Score API (#19522 ) Signed-off-by: strutive07 <strutive07@gmail.com>	2025-06-12 07:00:47 +00:00
Nick Hill	d5bdf899e4	[BugFix] Work-around incremental detokenization edge case error (#19449 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-12 06:43:20 +00:00
22quinn	7e3e74c97c	[Frontend] Improve error message in tool_choice validation (#19239 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-12 01:13:00 -04:00
Brayden Zhong	3f6341bf7f	Add Triton Fused MoE kernel config for E=16 on B200 (#19518 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-06-12 04:31:51 +00:00
Varun Sundar Rabindranath	e5d35d62f5	[BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import (#19514 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-12 04:28:12 +00:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
Richard Zou	42f52cc95b	[CI/Build] Fix torch nightly CI dependencies (#19505 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-06-11 14:40:42 -07:00
Robert Shaw	97a9465bbc	[UX] Add Feedback During CUDAGraph Capture (#19501 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-06-11 21:09:05 +00:00
rasmith	c7ea0b56cd	[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-06-11 15:53:28 -04:00
bnellnm	29fa5cac1c	[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-06-11 12:53:10 -04:00

1 2 3 4 5 ...

7126 Commits