xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-30 12:17:06 +08:00

Author	SHA1	Message	Date
Bangsheng Tang	848562bd49	break execute_model in gpu_model_runner into sub-functions for custom scopes (#24265 ) Co-authored-by: Bangsheng Tang <bangsheng@meta.com>	2025-09-06 14:02:47 -07:00
Ye (Charlotte) Qi	a3645ed94d	[Frontend][Responses API] Support reporting tool output tokens and fix reasoning token count (#24285 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-06 13:27:15 -07:00
Aaron Pham	fb691ee4e7	[Fix] [gpt-oss] fix non-tool calling path for chat completion (#24324 )	2025-09-06 19:10:32 +00:00
Ashwin Phadke	6024d115cd	Lora bias(enable_lora_bias) deprecate warning (#24339 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-07 00:42:19 +08:00
Isotr0py	00a4e56d8d	[Bugfix] Fix broken deepseek fp8 TP weights loading (#24367 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-06 09:23:12 -07:00
Roger Wang	eddaafc1c7	[Multimodal] Improve max video embedding length estimation in V1 (#24312 ) Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-09-06 02:33:19 -07:00
Andrew Sansom	305a1cc0d2	refactor: Turn GPUModelRunner.inputs_embeds to a CpuGpuBuffer (#24345 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-05 23:01:23 -07:00
wang.yuqi	6d6c6b05d3	[New Model]: google/embeddinggemma-300m (#24318 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-05 22:58:36 -07:00
Isotr0py	53b19ccdd5	[Core] Allow disabling TP sharding for parallel Linear layer (#23024 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-05 22:53:58 -07:00
Nick Hill	6432739ef1	[Bugfix] Catch and log invalid token ids in detokenizer (#24351 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-05 22:30:22 -07:00
yzds	ac201a0eaf	[Feature] Support Decode Context Parallel (DCP) for MLA (#23734 ) Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-06 13:24:05 +08:00
Yong Hoon Shin	3c529fc994	[KV Sharing] Raise error if using eagle with fast prefill (#24350 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-09-05 20:22:40 -07:00
Didier Durand	35bf193864	[Doc]: fix typos in Python comments (#24294 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-05 19:41:12 -07:00
Benjamin Chislett	cee182b297	[Perf][V1] Fully overlap model execution (#23569 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-09-05 18:20:17 -07:00
Shiyan Deng	9dfbeb41e5	[RFC] allow cancelation after shutdown in blocking collective_rpc (#23390 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com>	2025-09-05 14:14:18 -07:00
Chauncey	23a6c5280e	[gpt-oss][Bugfix]Fix streamableparser for missing handling of certain token_ids (#24306 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-09-05 10:26:00 -07:00
liuzhenwei	e599e2c65e	[XPU][P/D] Add XPU support in NixlConnector (#22436 ) Signed-off-by: zhenwei <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 21:03:12 -07:00
Aaron Pham	c29fb540ff	[gpt-oss] tool parser supports for /chat/completions [1/n] (#22386 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-04 20:39:12 -07:00
Nicolò Lucchesi	65e038931d	[Frontend] Skip unnecessary detokenization when token_id is requested (#24236 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-04 23:04:12 +00:00
Seiji Eicher	60b755cbcb	[Misc] Have AsyncLLM `custom_stat_loggers` extend default logger list (#20952 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-04 14:25:30 -07:00
Saman A. Pour	482e52f56c	QWEN3 Coder Fused MoE kernels Optimization configs (#24266 ) Signed-off-by: Saman Keon <samanamp@outlook.com>	2025-09-04 20:33:43 +00:00
Jee Jee Li	94866d7c93	[Misc] Slight improve deepgemm print (#24085 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-04 16:06:51 +00:00
Didier Durand	83609ca91d	[Doc]: fix typos in Python comments (#24173 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-04 08:52:17 -07:00
Nick Hill	e41a0fa377	[Perf] Freeze core engine proc heap after init (#24008 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-04 22:55:23 +08:00
nvjullin	37241077d5	[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 09:25:40 -04:00
Yash Pratap Singh	c9f7081f9c	[LoRA]: Add lora support to qwen-2.5-omni (#24231 )	2025-09-04 05:50:50 -07:00
Kunshang Ji	16ded21eeb	[XPU] support Triton Attention backend on Intel GPU (#24149 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 20:41:08 +08:00
nopperl	2b30afa442	Use hidden_size_per_head as head_size fallback (#24221 ) Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>	2025-09-04 12:59:16 +01:00
Jiangyun Zhu	eafa8dcde6	[Model] Add pp support for hunyuan (#24212 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-04 03:58:26 -07:00
Kebe	8f423e5f43	[Feature][Response API] Add streaming support for non-harmony (#23741 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-09-04 17:49:06 +08:00
Lucas Wilkinson	402759d472	[Attention] FlashAttn MLA (#14258 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-04 02:47:59 -07:00
Fanli Lin	2c301ee2eb	[Bugfix] Fix Incremental Detokenization with `tokenizers == 0.22.0` (#24159 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-04 02:47:08 -07:00
whx	3efb9f4d95	[Attention][Platform] Refactor MLA to support Custom Op (#23332 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-09-04 02:46:37 -07:00
mgazz	51d5e9be7d	[Core][Model] Terratorch backend integration (#23513 ) Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com> Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-04 00:22:41 -07:00
bingchen-mi	e7fc70016f	[Model] Add MiDashengLM model support (#23652 ) Signed-off-by: chenbing8 <chenbing8@xiaomi.com> Signed-off-by: bingchen-mi <chenbing8@xiaomi.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-04 00:08:09 -07:00
Li, Jiang	57b1ce94f7	[CPU] Refactor CPU unquantized linear (#24150 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-04 14:28:45 +08:00
Benji Beck	cb55ad86fe	Migrate ultravox inputs to TensorSchema (#23503 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-09-04 06:09:11 +00:00
Flora Feng	712b273f65	[Refactor] Introduce basic Renderer for completion-style request (#24010 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-09-04 05:21:12 +00:00
wuhang	a38f8bd54c	[Feature][Responses API]Support MCP tools with streaming mode + background mode (#23927 ) Signed-off-by: wuhang <wuhang6@huawei.com>	2025-09-04 04:05:10 +00:00
Peter Pan	b5ee1e3261	Remove deprecated `PyNcclConnector` (#24151 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2025-09-03 22:49:16 +00:00
George Nagy II	36c260dad6	[Feature][gpt-oss] Add support for num_cached_tokens and num_reasoning_tokens tracking (#23460 ) Signed-off-by: George Nagy II <george.nagy0969@gmail.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-03 21:08:47 +00:00
Kebe	a43a3f1770	[Bugfix][DP] DP distribution does not require ray[default] (#23822 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-09-03 13:21:36 -07:00
WeiQing Chen	6adaed42f4	[Feature][P/D]: Optimize NIXL Connector xfer Launch (#23887 ) Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com>	2025-09-03 19:14:30 +00:00
Matthew Bonanni	a742322092	[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-03 14:05:24 -04:00
Benji Beck	731a6940e3	Migrate whisper inputs to TensorSchema (#23505 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-09-03 18:04:00 +00:00
bnellnm	e9b92dcd89	[Kernels] Overlap shared experts with send/recv (#23273 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-03 12:35:18 -04:00
nopperl	fa4311d85f	[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998 ) Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp> Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com> Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>	2025-09-03 08:24:02 -07:00
Burkhard Ringlein	6d80ae83e1	[Bugfix] Fixing division by zero in triton_attn if query_heads/kv_heads > 16 (#23424 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>	2025-09-03 15:01:09 +00:00
qscqesze	6997a25ac6	[Model] Remove useless code from MiniMax implementation (#23982 ) Signed-off-by: QscQ <qscqesze@gmail.com> Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-09-03 11:27:04 +00:00
Jakub Smid	28f350e147	Support add_generation_prompt in embeddings endpoint with chat request (#23931 ) Signed-off-by: biba10 <jaksmid@seznam.cz>	2025-09-03 10:47:55 +00:00

1 2 3 4 5 ...

6257 Commits