Copilot
|
a736e5ff77
|
[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly (#28074)
|
2025-11-07 15:58:16 +08:00 |
|
baonudesifeizhai
|
9da9208b20
|
[Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 (#28256)
|
2025-11-07 07:31:58 +00:00 |
|
smit kadvani
|
11fd69dd54
|
[amd][gptoss] Perf gain because of block alignment (#28024)
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com>
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>
|
2025-11-07 05:27:42 +00:00 |
|
Harry Mellor
|
c0a4b95d64
|
Fix issues from #28242 (#28257)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-07 04:23:17 +00:00 |
|
Alexis MacAskill
|
a47d94f18c
|
Add runai model streamer e2e test for GCS (#28079)
Signed-off-by: Alexis MacAskill <amacaskill@google.com>
|
2025-11-07 03:07:54 +00:00 |
|
Alex Brooks
|
e70fbc599b
|
[CI/Build] Loosen STT LoRA Translate Check (Flaky Test) (#28247)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-11-07 02:51:27 +00:00 |
|
Lucas Kabela
|
4bf56c79cc
|
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-11-07 00:16:03 +00:00 |
|
Junhong Liu
|
59b453eaa2
|
Speed up mm processor kwargs per request by spliting dynamic and static kwargs (#26483)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
|
2025-11-07 07:51:28 +08:00 |
|
Eugene Khvedchenya
|
827e4237bc
|
Fix failing test for CRadio (#27738)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wang.yuqi <noooop@126.com>
|
2025-11-06 15:32:25 -08:00 |
|
Varun Sundar Rabindranath
|
ca6f755d24
|
[BugFix] Fix FusedMoELoRA + ModularKernel Integration (#28237)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-06 22:53:30 +00:00 |
|
Matthew Bonanni
|
ca90f50304
|
[Test] Add non-MoE DP test coverage (#28235)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-06 20:59:57 +00:00 |
|
Fang Han
|
da855b42d2
|
[Doc]: Make extraInit containers fully configurable in helm chart (#27497)
Signed-off-by: Fang Han <fhan0520@gmail.com>
|
2025-11-06 20:27:16 +00:00 |
|
Aleksandr Malyshev
|
449de9001a
|
[ROCm] triton fp8 kernel (#27058)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
|
2025-11-06 14:46:44 -05:00 |
|
Vico Chu
|
d4aa65c998
|
[Chore] eliminate duplicated and unconditional object serialization in anthropic messages api (#27792)
Signed-off-by: Vico Chu <vico24826@gmail.com>
|
2025-11-06 19:09:19 +00:00 |
|
Julien Denize
|
7a8375f8a0
|
Add llama 4 scaling support (#28145)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-11-06 18:55:17 +00:00 |
|
Andy Lo
|
5e0c1fe69c
|
[Structured outputs] Upgrade llguidance to 1.3.0 (#28039)
Signed-off-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-11-06 10:24:47 -08:00 |
|
Russell Bryant
|
4507a6dae4
|
CODEOWNERS: Add myself as reviewer on security docs (#28216)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-11-06 17:39:42 +00:00 |
|
Roy Wang
|
d1dd5f53e4
|
[Frontend] Fix logging format when enable response logging (#28049)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2025-11-06 16:25:39 +00:00 |
|
StanHatko
|
e52e4da971
|
[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores (#27953)
Signed-off-by: Stan Hatko <stan_hatko@live.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-11-06 23:47:11 +08:00 |
|
Milos Puzovic
|
2176778cd3
|
[Doc] Add Arm CPUs are on the list of supported targets in vLLM (#26018)
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com>
|
2025-11-06 15:30:26 +00:00 |
|
Eric Yue
|
0370679ce9
|
[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 (#28200)
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>
|
2025-11-06 07:29:46 -08:00 |
|
Harry Mellor
|
8816e375d3
|
[Docs] Switch to directory style URLs (#28058)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-06 07:06:33 -08:00 |
|
Michael Goin
|
f32229293e
|
Disable nm-testing models with issues in CI (#28206)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-06 06:19:07 -08:00 |
|
xiangze-arm
|
c757a15f0f
|
[CPU]Improve cpu fused moe perf (#27244)
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
|
2025-11-06 11:04:18 +00:00 |
|
Chauncey
|
59a50afa08
|
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony (#26874)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-06 10:40:03 +00:00 |
|
courage17340
|
981cadb35c
|
[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty (#28181)
Signed-off-by: courage17340 <courage17340@163.com>
|
2025-11-06 17:52:13 +08:00 |
|
wangxiyuan
|
c3ee80a01a
|
[V0 deprecation]clean up is_v1_supported_oracle (#28116)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-06 16:05:32 +08:00 |
|
Aditya Tewari
|
3755c14532
|
[CPU] Enable torch profiling (#28130)
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
|
2025-11-06 07:32:05 +00:00 |
|
Seungduk Kim
|
201dc98acc
|
Fix hard-coded parameter name in gemma3n.py (#27946)
Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-11-05 23:07:36 -08:00 |
|
Julien Denize
|
a404e2c0f1
|
Patch Mistral Tokenizer (#28146)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-11-06 06:43:16 +00:00 |
|
Xiaozhu Meng
|
e31946f86e
|
[flashinfer] fix FI all2all with FI cutlass moe (#28166)
Signed-off-by: Xiaozhu <mxz297@gmail.com>
|
2025-11-06 05:52:16 +00:00 |
|
gmagogsfm
|
bde5039325
|
[CI] Add compile/test_multimodal_compile.py to CI (#28151)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-06 05:41:47 +00:00 |
|
Jacob Zhong
|
d72299d47b
|
Make the cv2 dependency optional (#27780)
Signed-off-by: Jacob <cmpute@qq.com>
|
2025-11-06 05:08:55 +00:00 |
|
Lukas Geiger
|
80679f108f
|
[Core][MM] Use non-blocking CPU-GPU copy of multimodal data (#28141)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-06 04:05:12 +00:00 |
|
Isotr0py
|
43ecd0a900
|
[Chore] Clean up deepseek v2/v3 config copy (#28055)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-06 03:46:30 +00:00 |
|
Chauncey
|
07d614511f
|
[Misc] Remove the duplicate code (#28111)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-05 21:07:47 -05:00 |
|
Vadim Gimpelson
|
f948ab6945
|
[CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests (#28170)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-06 01:22:13 +00:00 |
|
Wentao Ye
|
d71af5f502
|
[Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement (#28164)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-05 17:21:08 -08:00 |
|
Wentao Ye
|
90189c71a9
|
[Bug] Fix env string "0" same to True (#28159)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-05 17:04:20 -08:00 |
|
Wentao Ye
|
d79d9f0780
|
[Bug] Fix cpu disable shared_experts VLLM_DISABLE_SHARED_EXPERTS_STREAM (#28157)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-05 17:03:09 -08:00 |
|
Vadim Gimpelson
|
b6a248bdd7
|
[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-05 17:01:12 -08:00 |
|
Dayeol Lee
|
1767658559
|
[Debugging] Add annotation for easier trace analysis (#22496)
|
2025-11-05 16:52:52 -08:00 |
|
Kuntai Du
|
efe73e9b57
|
[Core][Hybrid allocator + connector 2/n] Unify remove_skipped_blocks by get_last_useful_token (#25431)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-11-06 00:12:00 +00:00 |
|
Zhewen Li
|
0b8e871e5e
|
[CI/Build] Fix test_defaults_with_usage_context in AMD CI (#27926)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-05 15:40:24 -08:00 |
|
Zhewen Li
|
5ee93a5956
|
[CI/Build] Update checking logic in cutlass_group_gemm_supported (#27948)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-05 15:40:10 -08:00 |
|
Snehlata
|
e15601789b
|
[Feature]: Add corrupted request metric to V1 metrics system. (#27306)
Signed-off-by: atalhens <sneh.lata@nutanix.com>
|
2025-11-05 13:45:29 -08:00 |
|
Richard Zou
|
65ac8d8dc4
|
[Docs] Add guide to debugging vLLM-torch.compile integration (#28094)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-11-05 21:31:46 +00:00 |
|
Isotr0py
|
ffb08379d8
|
[Chore] Remove Nemotron-Nano-VL config copy (#28126)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-05 20:06:45 +00:00 |
|
R3hankhan
|
e04492449e
|
[Hardware][IBM Z] Optimize s390x Dockerfile (#28023)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2025-11-05 11:25:44 -08:00 |
|
Michael Yao
|
518ec6b722
|
[Docs] Clean up README_TUNING.md (#28088)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-11-05 19:01:34 +00:00 |
|