xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-20 11:45:57 +08:00

Author	SHA1	Message	Date
Ming Yang	0f67d4d962	[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek (#26397 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-10-24 10:24:08 -07:00
kourosh hakhamaneshi	7e1d697b56	[Bugfix] Fix MultiConnector stats reconstruction across process boundaries (#27366 ) Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-10-24 17:08:05 +00:00
Chendi.Xue	699d62e6cf	[NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished (#27297 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-10-24 17:01:41 +00:00
Richard Zou	cd390b609d	[compile] Turn standalone_compile back on (#27460 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-10-24 16:30:27 +00:00
Fadi Arafeh	2080b05099	[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype (#27472 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-10-24 15:57:48 +00:00
Lifans	6454afec90	[Doc] Fix minor issues in docs/design/metrics.md (#27436 ) Signed-off-by: Lifan Shen <lifans@meta.com>	2025-10-24 05:40:54 -07:00
Chauncey	41a62564a7	Fix test named tool use (#27458 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-24 20:27:45 +08:00
fhl2000	284cc92275	[MISC] `cudagraph_capture_sizes` related improvements (#26016 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-24 05:11:05 -07:00
ioana ghiban	435be10db9	Fix AArch64 CPU Docker pipeline (#27331 ) Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>	2025-10-24 05:11:01 -07:00
Cyrus Leung	b7030d962b	[Benchmark] Enable benchmark to run with `encoding_format="bytes"` (#27467 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-24 11:16:50 +00:00
Chauncey	3567816932	[Refactor] move tool parsing logic from protocol.py to the tool parser (#27383 ) Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-10-24 09:53:23 +00:00
22quinn	e0ef8a2920	[BugFix] Fix torchrun DP with LLM class (#27395 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-10-24 08:11:37 +00:00
Isotr0py	42efe609ba	[MM][Bugfix] Replace `PatchEmbed`'s conv3d to linear layer (#27418 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-24 07:32:47 +00:00
Yu Jiaqi	88d3141ec6	[Docs] remove v1 column for embedding models (#27446 ) Signed-off-by: piood <2477084691@qq.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-10-23 23:55:03 -07:00
Rui Qiao	09a6a49eaf	[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator (#27443 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-24 14:53:09 +08:00
strinczer	074475541a	[Bugfix] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API (#26706 ) Signed-off-by: Shai Trinczer <strinczer@icloud.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-23 22:53:42 -07:00
Aaron Pham	d4c574c39f	[Chore] remove structural tags logging lines (#27451 )	2025-10-24 05:35:45 +00:00
usberkeley	c528b9006a	Fix EventPublisherFactory logic for disabled KV cache events (#27419 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-10-24 05:00:01 +00:00
fhl2000	85fee74b33	[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder (#27427 ) Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>	2025-10-23 20:31:14 -07:00
hfan	8dbe0c527f	[Misc] Add TPU usage report when using tpu_inference. (#27423 ) Signed-off-by: Hongmin Fan <fanhongmin@google.com>	2025-10-23 20:29:37 -07:00
Xiangyu Li	5cc6bddb6e	[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )	2025-10-23 23:26:13 -04:00
Harry Mellor	1f9460c4c1	Fix pooling adapters for Transformers backend (#27338 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-23 20:23:55 -07:00
xiao-llm	70022ffc00	Granite 4.0 quark quantization support (#26944 ) Signed-off-by: Xiao YU <Xiao.YU@xilinx.com> Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com> Co-authored-by: Xiao YU <Xiao.YU@xilinx.com>	2025-10-24 02:14:03 +00:00
Akash kaothalkar	f417746ad7	[Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc (#27422 ) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>	2025-10-23 21:21:36 +00:00
Yu Jiaqi	0552cfb195	[Model] Siglip Embedding Support (#27324 ) Signed-off-by: piood <2477084691@qq.com>	2025-10-23 20:19:48 +00:00
Kebe	51dd14ac2b	[Bugfix][DP] Fix creating too many DP Placement Groups (#26880 ) Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-23 20:16:51 +00:00
Matthew Bonanni	dbfbf9f324	[Attention] Fix FlashMLA metadata builder arguments for q_len > 1 (#27368 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-23 15:58:15 -04:00
Jonathan Chen	ca76486a16	[Chore] Separate out `vllm.utils.platform_utils.py` (#27374 ) Signed-off-by: Jonathan <chenleejonathan@gmail.com>	2025-10-23 19:08:06 +00:00
Varun Sundar Rabindranath	a9f55dc588	[Misc] Add triton_kernels dependency (#27370 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-23 12:04:14 -07:00
Isotr0py	81d5bb765a	[Bugfix] Fix AWQ marlin layer skipping (#27416 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-23 18:30:28 +00:00
Gregory Shtrasberg	0825197bee	[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek (#27373 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-23 17:43:53 +00:00
Alexander Matveev	9ef3d5b875	[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer (#27220 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-10-24 00:03:14 +08:00
Alexei-V-Ivanov-AMD	295c7f0267	Mirroring the test definitions (2025-10-22) (#27362 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-10-24 00:02:26 +08:00
wang.yuqi	3fa2c12185	[Frontend][4/N] Improve all pooling task \| Add plugin pooling task (#26973 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Christian Pinto <christian.pinto@ibm.com>	2025-10-23 14:46:18 +00:00
Cyrus Leung	fe2016de2d	[CI/Build] Remove unnecessary flags from test registry (#27353 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-23 14:42:40 +00:00
Ilya Markov	237cf6d32a	[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) (#26709 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-10-23 20:58:39 +08:00
Navya Srivastava	faee3ccdc2	[Feature] Pydantic validation for speculative.py (#27156 ) Signed-off-by: Navya Srivastava <navya.srivastava1707@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-23 12:19:33 +00:00
Bradley D	570c3e1cd4	[Bugfix] Honor --mm_encoder_attn_backend when used (#27124 ) Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-23 20:09:52 +08:00
Harry Mellor	3a4255c7c4	Run mypy on the lowest supported Python version instead of system Python (#27048 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-23 05:07:44 -07:00
tomeras91	61089465a6	[Model] Add MoE support for NemotronH (#25863 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-10-23 10:27:23 +00:00
Tova Movshovitz	88afa11010	[Metrics] [KVConnector] Add connector prefix cache hit rate stats (#26245 ) Signed-off-by: tovam <tovam@pliops.com>	2025-10-23 12:21:08 +02:00
Chauncey	d00ce29d89	[CI] Reorganize entrypoints tests (#27403 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-23 10:10:06 +00:00
Louie Tsai	3b7bdf983b	add SLA information into comparison graph for vLLM Benchmark Suite (#25525 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: louie-tsai <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-23 08:04:59 +00:00
Zhewen Li	50b788a17a	[CI/Build] Fix AMD CI: test_cpu_gpu.py (#27388 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-23 07:55:00 +00:00
Lucia Fang	fc059c7061	[Bugfix] Fix args settings for guided decoding args (#27375 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-10-23 07:34:06 +00:00
Cyrus Leung	bfb240cc49	[CI/Build] Fix Prithvi plugin test (#27393 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-23 07:30:44 +00:00
Jonathan Chen	e255d92990	[Chore] Remove duplicate `has_` functions in vllm.utils (#27372 ) Signed-off-by: Jonathan <chenleejonathan@gmail.com>	2025-10-23 06:11:59 +00:00
wang.yuqi	3729ed00ba	[Model] Add num_cached_tokens for PoolingRequestOutput (#27378 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-10-23 14:03:42 +08:00
Giancarlo Delfin	6644796bf4	[V1][spec decode] return logprobs for spec decoding (#26060 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-10-22 22:59:59 -07:00
Andrew Sansom	ff93cc8c84	[CORE] Support Prefix Caching with Prompt Embeds (#27219 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-10-22 22:18:07 -07:00

... 20 21 22 23 24 ...

11777 Commits