xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-03 07:24:26 +08:00

Author	SHA1	Message	Date
Boyuan Feng	b158df2813	remove resolve_op_overloads and use splitting_ops directly (#28081 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-08 01:13:13 +00:00
Kunshang Ji	1aaecda078	[XPU] Enable Expert parallel for MoE models (#28263 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 00:33:11 +00:00
Nick Hill	67a2da890e	[PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 22:11:03 +00:00
Nick Hill	da786e339e	[Core] Rework handling of async scheduling config (#28250 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 20:01:23 +00:00
Benjamin Chislett	18903216f5	[Bugfix] Fix and add tests for GptOss reasoning parser (#28000 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-07 19:28:04 +00:00
Nicolò Lucchesi	68a72a5cc1	Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012 )" (#28289 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-07 15:07:01 +00:00
Boyuan Feng	0f872b7977	[Log] update shm wait time msg (#28255 )	2025-11-07 09:43:30 -05:00
Wentao Ye	4b1ff13221	[Feature] Default `ignore_eos` True for `random` dataset (#28227 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-07 07:35:33 -05:00
Iceber Gu	e0d6b4a867	[CLI] add --max-tokens to `vllm complete` (#28109 ) Signed-off-by: Iceber Gu <caiwei95@hotmail.com>	2025-11-07 12:21:40 +00:00
Pavani Majety	72b1c2ae2c	[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-11-07 04:18:39 -08:00
Lukas Geiger	e0919f331d	[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (#28168 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-07 12:14:29 +00:00
Kevin H. Luu	8e19d470af	[fix] Revert "fixing mm placeholder replacement issue with gemma3" (#28285 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-11-07 12:09:09 +00:00
Mengqing Cao	1958bda9b4	[Misc][Model][Refactor] Pass the prefix into Linear layers (#28259 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2025-11-07 19:38:38 +08:00
Zhang Xiangze	7bdb42b2f2	[CPU]Avoid repeated random sample compile (#28260 ) Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>	2025-11-07 11:03:57 +00:00
汪志鹏	315068eb4a	[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark (#28265 ) Signed-off-by: princepride <wangzhipeng628@gmail.com>	2025-11-07 09:35:22 +00:00
Jialin Ouyang	ccd98b59c1	[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead (#28171 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-07 00:27:12 -08:00
Jee Jee Li	21b82f4ea2	[Kernel] LoRA triton kernels support PDL (#27402 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-07 08:05:48 +00:00
baonudesifeizhai	9da9208b20	[Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 (#28256 )	2025-11-07 07:31:58 +00:00
smit kadvani	11fd69dd54	[amd][gptoss] Perf gain because of block alignment (#28024 ) Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com> Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>	2025-11-07 05:27:42 +00:00
Harry Mellor	c0a4b95d64	Fix issues from #28242 (#28257 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-07 04:23:17 +00:00
Lucas Kabela	4bf56c79cc	[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-11-07 00:16:03 +00:00
Junhong Liu	59b453eaa2	Speed up mm processor kwargs per request by spliting dynamic and static kwargs (#26483 ) Signed-off-by: Junhong <liujunhong11@huawei.com> Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by: Junhong <liujunhong11@huawei.com>	2025-11-07 07:51:28 +08:00
Varun Sundar Rabindranath	ca6f755d24	[BugFix] Fix FusedMoELoRA + ModularKernel Integration (#28237 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-06 22:53:30 +00:00
Aleksandr Malyshev	449de9001a	[ROCm] triton fp8 kernel (#27058 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>	2025-11-06 14:46:44 -05:00
Vico Chu	d4aa65c998	[Chore] eliminate duplicated and unconditional object serialization in anthropic messages api (#27792 ) Signed-off-by: Vico Chu <vico24826@gmail.com>	2025-11-06 19:09:19 +00:00
Julien Denize	7a8375f8a0	Add llama 4 scaling support (#28145 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-11-06 18:55:17 +00:00
Roy Wang	d1dd5f53e4	[Frontend] Fix logging format when enable response logging (#28049 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2025-11-06 16:25:39 +00:00
StanHatko	e52e4da971	[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores (#27953 ) Signed-off-by: Stan Hatko <stan_hatko@live.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-11-06 23:47:11 +08:00
Eric Yue	0370679ce9	[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 (#28200 ) Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>	2025-11-06 07:29:46 -08:00
xiangze-arm	c757a15f0f	[CPU]Improve cpu fused moe perf (#27244 ) Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>	2025-11-06 11:04:18 +00:00
Chauncey	59a50afa08	[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony (#26874 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-11-06 10:40:03 +00:00
wangxiyuan	c3ee80a01a	[V0 deprecation]clean up is_v1_supported_oracle (#28116 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-06 16:05:32 +08:00
Aditya Tewari	3755c14532	[CPU] Enable torch profiling (#28130 ) Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>	2025-11-06 07:32:05 +00:00
Seungduk Kim	201dc98acc	Fix hard-coded parameter name in gemma3n.py (#27946 ) Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com> Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-05 23:07:36 -08:00
Julien Denize	a404e2c0f1	Patch Mistral Tokenizer (#28146 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-11-06 06:43:16 +00:00
Xiaozhu Meng	e31946f86e	[flashinfer] fix FI all2all with FI cutlass moe (#28166 ) Signed-off-by: Xiaozhu <mxz297@gmail.com>	2025-11-06 05:52:16 +00:00
Jacob Zhong	d72299d47b	Make the cv2 dependency optional (#27780 ) Signed-off-by: Jacob <cmpute@qq.com>	2025-11-06 05:08:55 +00:00
Lukas Geiger	80679f108f	[Core][MM] Use non-blocking CPU-GPU copy of multimodal data (#28141 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-06 04:05:12 +00:00
Isotr0py	43ecd0a900	[Chore] Clean up deepseek v2/v3 config copy (#28055 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-06 03:46:30 +00:00
Chauncey	07d614511f	[Misc] Remove the duplicate code (#28111 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-11-05 21:07:47 -05:00
Wentao Ye	d71af5f502	[Feature] Enable TP + EP `shared_experts` overlap with router, 3.7% E2E performance improvement (#28164 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:21:08 -08:00
Wentao Ye	90189c71a9	[Bug] Fix env string `"0"` same to `True` (#28159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:04:20 -08:00
Wentao Ye	d79d9f0780	[Bug] Fix cpu disable shared_experts `VLLM_DISABLE_SHARED_EXPERTS_STREAM` (#28157 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:03:09 -08:00
Vadim Gimpelson	b6a248bdd7	[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-05 17:01:12 -08:00
Dayeol Lee	1767658559	[Debugging] Add annotation for easier trace analysis (#22496 )	2025-11-05 16:52:52 -08:00
Kuntai Du	efe73e9b57	[Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks` by `get_last_useful_token` (#25431 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-11-06 00:12:00 +00:00
Zhewen Li	5ee93a5956	[CI/Build] Update checking logic in cutlass_group_gemm_supported (#27948 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-05 15:40:10 -08:00
Snehlata	e15601789b	[Feature]: Add corrupted request metric to V1 metrics system. (#27306 ) Signed-off-by: atalhens <sneh.lata@nutanix.com>	2025-11-05 13:45:29 -08:00
Isotr0py	ffb08379d8	[Chore] Remove Nemotron-Nano-VL config copy (#28126 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-05 20:06:45 +00:00
Michael Yao	518ec6b722	[Docs] Clean up README_TUNING.md (#28088 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-11-05 19:01:34 +00:00

1 2 3 4 5 ...

7702 Commits