Boyuan Feng
|
b158df2813
|
remove resolve_op_overloads and use splitting_ops directly (#28081)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-11-08 01:13:13 +00:00 |
|
Kunshang Ji
|
1aaecda078
|
[XPU] Enable Expert parallel for MoE models (#28263)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-08 00:33:11 +00:00 |
|
Nick Hill
|
67a2da890e
|
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-07 22:11:03 +00:00 |
|
Nick Hill
|
da786e339e
|
[Core] Rework handling of async scheduling config (#28250)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-07 20:01:23 +00:00 |
|
Benjamin Chislett
|
18903216f5
|
[Bugfix] Fix and add tests for GptOss reasoning parser (#28000)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-11-07 19:28:04 +00:00 |
|
Nicolò Lucchesi
|
68a72a5cc1
|
Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012)" (#28289)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-07 15:07:01 +00:00 |
|
Boyuan Feng
|
0f872b7977
|
[Log] update shm wait time msg (#28255)
|
2025-11-07 09:43:30 -05:00 |
|
Wentao Ye
|
4b1ff13221
|
[Feature] Default ignore_eos True for random dataset (#28227)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-07 07:35:33 -05:00 |
|
Iceber Gu
|
e0d6b4a867
|
[CLI] add --max-tokens to vllm complete (#28109)
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
|
2025-11-07 12:21:40 +00:00 |
|
Pavani Majety
|
72b1c2ae2c
|
[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-11-07 04:18:39 -08:00 |
|
Lukas Geiger
|
e0919f331d
|
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (#28168)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-07 12:14:29 +00:00 |
|
Kevin H. Luu
|
8e19d470af
|
[fix] Revert "fixing mm placeholder replacement issue with gemma3" (#28285)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2025-11-07 12:09:09 +00:00 |
|
Mengqing Cao
|
1958bda9b4
|
[Misc][Model][Refactor] Pass the prefix into Linear layers (#28259)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2025-11-07 19:38:38 +08:00 |
|
Zhang Xiangze
|
7bdb42b2f2
|
[CPU]Avoid repeated random sample compile (#28260)
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
|
2025-11-07 11:03:57 +00:00 |
|
汪志鹏
|
315068eb4a
|
[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark (#28265)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
2025-11-07 09:35:22 +00:00 |
|
Jialin Ouyang
|
ccd98b59c1
|
[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead (#28171)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-07 00:27:12 -08:00 |
|
Jee Jee Li
|
21b82f4ea2
|
[Kernel] LoRA triton kernels support PDL (#27402)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-07 08:05:48 +00:00 |
|
baonudesifeizhai
|
9da9208b20
|
[Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 (#28256)
|
2025-11-07 07:31:58 +00:00 |
|
smit kadvani
|
11fd69dd54
|
[amd][gptoss] Perf gain because of block alignment (#28024)
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com>
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>
|
2025-11-07 05:27:42 +00:00 |
|
Harry Mellor
|
c0a4b95d64
|
Fix issues from #28242 (#28257)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-07 04:23:17 +00:00 |
|
Lucas Kabela
|
4bf56c79cc
|
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-11-07 00:16:03 +00:00 |
|
Junhong Liu
|
59b453eaa2
|
Speed up mm processor kwargs per request by spliting dynamic and static kwargs (#26483)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
|
2025-11-07 07:51:28 +08:00 |
|
Varun Sundar Rabindranath
|
ca6f755d24
|
[BugFix] Fix FusedMoELoRA + ModularKernel Integration (#28237)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-06 22:53:30 +00:00 |
|
Aleksandr Malyshev
|
449de9001a
|
[ROCm] triton fp8 kernel (#27058)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
|
2025-11-06 14:46:44 -05:00 |
|
Vico Chu
|
d4aa65c998
|
[Chore] eliminate duplicated and unconditional object serialization in anthropic messages api (#27792)
Signed-off-by: Vico Chu <vico24826@gmail.com>
|
2025-11-06 19:09:19 +00:00 |
|
Julien Denize
|
7a8375f8a0
|
Add llama 4 scaling support (#28145)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-11-06 18:55:17 +00:00 |
|
Roy Wang
|
d1dd5f53e4
|
[Frontend] Fix logging format when enable response logging (#28049)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2025-11-06 16:25:39 +00:00 |
|
StanHatko
|
e52e4da971
|
[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores (#27953)
Signed-off-by: Stan Hatko <stan_hatko@live.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-11-06 23:47:11 +08:00 |
|
Eric Yue
|
0370679ce9
|
[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 (#28200)
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>
|
2025-11-06 07:29:46 -08:00 |
|
xiangze-arm
|
c757a15f0f
|
[CPU]Improve cpu fused moe perf (#27244)
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
|
2025-11-06 11:04:18 +00:00 |
|
Chauncey
|
59a50afa08
|
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony (#26874)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-06 10:40:03 +00:00 |
|
wangxiyuan
|
c3ee80a01a
|
[V0 deprecation]clean up is_v1_supported_oracle (#28116)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-06 16:05:32 +08:00 |
|
Aditya Tewari
|
3755c14532
|
[CPU] Enable torch profiling (#28130)
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
|
2025-11-06 07:32:05 +00:00 |
|
Seungduk Kim
|
201dc98acc
|
Fix hard-coded parameter name in gemma3n.py (#27946)
Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-11-05 23:07:36 -08:00 |
|
Julien Denize
|
a404e2c0f1
|
Patch Mistral Tokenizer (#28146)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-11-06 06:43:16 +00:00 |
|
Xiaozhu Meng
|
e31946f86e
|
[flashinfer] fix FI all2all with FI cutlass moe (#28166)
Signed-off-by: Xiaozhu <mxz297@gmail.com>
|
2025-11-06 05:52:16 +00:00 |
|
Jacob Zhong
|
d72299d47b
|
Make the cv2 dependency optional (#27780)
Signed-off-by: Jacob <cmpute@qq.com>
|
2025-11-06 05:08:55 +00:00 |
|
Lukas Geiger
|
80679f108f
|
[Core][MM] Use non-blocking CPU-GPU copy of multimodal data (#28141)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-06 04:05:12 +00:00 |
|
Isotr0py
|
43ecd0a900
|
[Chore] Clean up deepseek v2/v3 config copy (#28055)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-06 03:46:30 +00:00 |
|
Chauncey
|
07d614511f
|
[Misc] Remove the duplicate code (#28111)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-05 21:07:47 -05:00 |
|
Wentao Ye
|
d71af5f502
|
[Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement (#28164)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-05 17:21:08 -08:00 |
|
Wentao Ye
|
90189c71a9
|
[Bug] Fix env string "0" same to True (#28159)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-05 17:04:20 -08:00 |
|
Wentao Ye
|
d79d9f0780
|
[Bug] Fix cpu disable shared_experts VLLM_DISABLE_SHARED_EXPERTS_STREAM (#28157)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-05 17:03:09 -08:00 |
|
Vadim Gimpelson
|
b6a248bdd7
|
[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-05 17:01:12 -08:00 |
|
Dayeol Lee
|
1767658559
|
[Debugging] Add annotation for easier trace analysis (#22496)
|
2025-11-05 16:52:52 -08:00 |
|
Kuntai Du
|
efe73e9b57
|
[Core][Hybrid allocator + connector 2/n] Unify remove_skipped_blocks by get_last_useful_token (#25431)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-11-06 00:12:00 +00:00 |
|
Zhewen Li
|
5ee93a5956
|
[CI/Build] Update checking logic in cutlass_group_gemm_supported (#27948)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-05 15:40:10 -08:00 |
|
Snehlata
|
e15601789b
|
[Feature]: Add corrupted request metric to V1 metrics system. (#27306)
Signed-off-by: atalhens <sneh.lata@nutanix.com>
|
2025-11-05 13:45:29 -08:00 |
|
Isotr0py
|
ffb08379d8
|
[Chore] Remove Nemotron-Nano-VL config copy (#28126)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-05 20:06:45 +00:00 |
|
Michael Yao
|
518ec6b722
|
[Docs] Clean up README_TUNING.md (#28088)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-11-05 19:01:34 +00:00 |
|