xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-27 12:17:19 +08:00

Author	SHA1	Message	Date
Jee Jee Li	62f66be1f7	[Bugfix] Fix Qwen3-coder moe tuned config (#24072 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-07 05:19:46 +00:00
Ye (Charlotte) Qi	81c53ef55c	[Misc] collect flashinfer version in collect_env.py (#24378 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-07 03:30:41 +00:00
Saman A. Pour	75334956c2	QWEN3 Thinking Fused MoE kernels Optimization configs (#24330 ) Signed-off-by: Saman Keon <samanamp@outlook.com>	2025-09-07 03:18:54 +00:00
Jiangyun Zhu	77aec83b8c	[Benchmark] add benchmark for custom activation op (#23908 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-06 20:12:05 -07:00
Aaron Pham	e67597545b	[CI][Fix] deterministic seed for flaky CI runs on structured outputs (#24380 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-09-07 11:10:40 +08:00
Benji Beck	37a6fa95fd	Migrate Qwen2 inputs to TensorSchema (#23475 ) Signed-off-by: Benji Beck <benjibeck@meta.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-06 20:07:31 -07:00
youkaichao	558f0907dc	[attention][DCP] use AttentionImpl.need_to_return_lse_for_decode (#24372 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-09-07 01:18:59 +00:00
Woosuk Kwon	4172235ab7	[V0 deprecation] Deprecate V0 Neuron backend (#21159 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-06 16:15:18 -07:00
Bangsheng Tang	848562bd49	break execute_model in gpu_model_runner into sub-functions for custom scopes (#24265 ) Co-authored-by: Bangsheng Tang <bangsheng@meta.com>	2025-09-06 14:02:47 -07:00
elvischenv	e68dc2f014	[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (#24370 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-09-06 20:39:34 +00:00
Ye (Charlotte) Qi	a3645ed94d	[Frontend][Responses API] Support reporting tool output tokens and fix reasoning token count (#24285 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-06 13:27:15 -07:00
Aaron Pham	fb691ee4e7	[Fix] [gpt-oss] fix non-tool calling path for chat completion (#24324 )	2025-09-06 19:10:32 +00:00
Ashwin Phadke	6024d115cd	Lora bias(enable_lora_bias) deprecate warning (#24339 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-07 00:42:19 +08:00
Jee Jee Li	7555d6b34a	[Bugfix] Fix test_mixtral_moe (#24371 )	2025-09-06 09:32:03 -07:00
Isotr0py	00a4e56d8d	[Bugfix] Fix broken deepseek fp8 TP weights loading (#24367 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-06 09:23:12 -07:00
mohankku	0eadaeff7e	[Bugfix] Avoid uninitialized usage of azp_val when AZP is false. (#24335 ) Signed-off-by: Mohan Kumar Kumar <mohan.cbein@gmail.com> Signed-off-by: mohankku <mohan.cbein@gmail.com>	2025-09-06 08:17:03 -07:00
Benjamin Chislett	0077c8634e	Add @benchislett to codeowner for spec decode and structured outputs (#24362 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-09-06 22:03:35 +08:00
Roger Wang	b121ca22ad	[CI] Disable flaky structured output test from CI (#24366 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-06 13:31:56 +00:00
Roger Wang	eddaafc1c7	[Multimodal] Improve max video embedding length estimation in V1 (#24312 ) Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-09-06 02:33:19 -07:00
Andrew Sansom	305a1cc0d2	refactor: Turn GPUModelRunner.inputs_embeds to a CpuGpuBuffer (#24345 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-05 23:01:23 -07:00
wang.yuqi	6d6c6b05d3	[New Model]: google/embeddinggemma-300m (#24318 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-05 22:58:36 -07:00
Isotr0py	53b19ccdd5	[Core] Allow disabling TP sharding for parallel Linear layer (#23024 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-05 22:53:58 -07:00
Nick Hill	6432739ef1	[Bugfix] Catch and log invalid token ids in detokenizer (#24351 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-05 22:30:22 -07:00
yzds	ac201a0eaf	[Feature] Support Decode Context Parallel (DCP) for MLA (#23734 ) Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-06 13:24:05 +08:00
Yong Hoon Shin	3c529fc994	[KV Sharing] Raise error if using eagle with fast prefill (#24350 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-09-05 20:22:40 -07:00
Didier Durand	35bf193864	[Doc]: fix typos in Python comments (#24294 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-05 19:41:12 -07:00
22quinn	35efa70297	Add @22quinn as code reviewer for RL related components (#24346 )	2025-09-06 01:56:15 +00:00
Benjamin Chislett	cee182b297	[Perf][V1] Fully overlap model execution (#23569 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-09-05 18:20:17 -07:00
Rafael Vasquez	c954c6629c	[CI] Add timeouts to tests (#24260 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-05 17:26:22 -07:00
Shiyan Deng	9dfbeb41e5	[RFC] allow cancelation after shutdown in blocking collective_rpc (#23390 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com>	2025-09-05 14:14:18 -07:00
elvischenv	eedb2a2a10	[Bugfix] Fix silu_mul+quant fusion test (#24341 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-09-05 20:13:42 +00:00
Chauncey	23a6c5280e	[gpt-oss][Bugfix]Fix streamableparser for missing handling of certain token_ids (#24306 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-09-05 10:26:00 -07:00
youkaichao	7812bcf278	[docs] add shenzhen meetup (#24326 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-09-05 22:48:42 +08:00
Louie Tsai	006e7a34ae	Adding int4 and int8 models for CPU benchmarking (#23709 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-09-05 20:08:50 +08:00
liuzhenwei	e599e2c65e	[XPU][P/D] Add XPU support in NixlConnector (#22436 ) Signed-off-by: zhenwei <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 21:03:12 -07:00
Aaron Pham	c29fb540ff	[gpt-oss] tool parser supports for /chat/completions [1/n] (#22386 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-04 20:39:12 -07:00
Nicolò Lucchesi	65e038931d	[Frontend] Skip unnecessary detokenization when token_id is requested (#24236 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-04 23:04:12 +00:00
Zhuohan Li	886ccbe5ba	[CI/Build] Reduce the number of redundant cases to test for LoRA (#24276 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-09-04 21:58:44 +00:00
elvischenv	adc3ddb430	[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 14:25:45 -07:00
Seiji Eicher	60b755cbcb	[Misc] Have AsyncLLM `custom_stat_loggers` extend default logger list (#20952 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-04 14:25:30 -07:00
Saman A. Pour	482e52f56c	QWEN3 Coder Fused MoE kernels Optimization configs (#24266 ) Signed-off-by: Saman Keon <samanamp@outlook.com>	2025-09-04 20:33:43 +00:00
Po-Han Huang (NVIDIA)	78336a0c3e	Upgrade FlashInfer to v0.3.0 (#24086 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-04 09:49:20 -07:00
Jee Jee Li	94866d7c93	[Misc] Slight improve deepgemm print (#24085 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-04 16:06:51 +00:00
Didier Durand	83609ca91d	[Doc]: fix typos in Python comments (#24173 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-04 08:52:17 -07:00
Nick Hill	e41a0fa377	[Perf] Freeze core engine proc heap after init (#24008 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-04 22:55:23 +08:00
nvjullin	37241077d5	[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 09:25:40 -04:00
Yash Pratap Singh	c9f7081f9c	[LoRA]: Add lora support to qwen-2.5-omni (#24231 )	2025-09-04 05:50:50 -07:00
Kunshang Ji	16ded21eeb	[XPU] support Triton Attention backend on Intel GPU (#24149 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 20:41:08 +08:00
nopperl	2b30afa442	Use hidden_size_per_head as head_size fallback (#24221 ) Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>	2025-09-04 12:59:16 +01:00
Jiangyun Zhu	eafa8dcde6	[Model] Add pp support for hunyuan (#24212 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-04 03:58:26 -07:00

... 2 3 4 5 6 ...

9348 Commits