Lumina
|
81b16a2bc9
|
[Kernel] Better inf handling for grouped topk cu (#24886)
Signed-off-by: lumina37 <starry.qvq@gmail.com>
|
2025-09-18 05:53:55 +00:00 |
|
Simon Mo
|
e111d5b0ae
|
[CLI] Use streaming in CLI chat and completion commands (#23769)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-17 22:30:26 -07:00 |
|
Simon Mo
|
a904ea78ea
|
[benchmark] add peak throughput metrics and plot (#23867)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-17 22:30:02 -07:00 |
|
Benjamin Chislett
|
b7433ca1a4
|
[Spec Decode] Efficient padded speculation (#24539)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-09-18 01:07:24 -04:00 |
|
Woosuk Kwon
|
5c65a72bb1
|
[V0 Deprecation] Remove more V0 tests (#25117)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 22:05:25 -07:00 |
|
YiwenC
|
9d8a2d86d2
|
[EPLB] Add EPLB support for hunyuan_v1 (#23078)
|
2025-09-18 04:51:35 +00:00 |
|
Chaojun Zhang
|
3bc18127ff
|
[XPU] Whisper model support on XPU Platform (#25123)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
|
2025-09-18 04:30:10 +00:00 |
|
Andrew Sansom
|
bec060fd99
|
Mark prompt logprobs as incompatible with prompt embeds at API level (#25077)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-17 21:25:07 -07:00 |
|
YiwenC
|
52bc9d5b3e
|
[Model] enable data parallel for InternVL vision encoder (#23909)
Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>
Signed-off-by: YiwenC <54658925+666even666@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-17 21:11:46 -07:00 |
|
bnellnm
|
dc2979c585
|
[Kernels] Overlap shared experts with combine instead of dispatch (#24254)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-18 12:10:21 +08:00 |
|
toncao
|
027d37df38
|
[Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and mlp in qwen2moe to successfully load ignored params in quantized models (#24960)
Signed-off-by: toncao <cpatonn@gmail.com>
Co-authored-by: toncao <cpatonn@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-18 12:08:50 +08:00 |
|
Lukas Geiger
|
b98219670f
|
[Core][MM] Cleanup MultiModalCache (#25006)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-09-17 21:08:41 -07:00 |
|
Harry Mellor
|
32baf1d036
|
[Docs] Clean up the contributing README (#25099)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-17 21:05:18 -07:00 |
|
Roger Wang
|
3127274d02
|
[MM Encoder] Apply DP ViT for Qwen3-VL model series (#24955)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Huang Jie <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: 松灵 <26085463+wulipc@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-17 21:04:21 -07:00 |
|
bnellnm
|
4ac510f484
|
[Kernels] Enable DeepGEMM by default (#24462)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-17 20:19:52 -07:00 |
|
Woosuk Kwon
|
7fb2a5be28
|
[V0 Deprecation] Skip PP test (#25128)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 20:18:36 -07:00 |
|
Woosuk Kwon
|
6c036615dc
|
[V0 Deprecation] Remove misc V0 tests (#25118)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 19:41:55 -07:00 |
|
Woosuk Kwon
|
2fc24e94f9
|
[V0 Deprecation] Remove V0 Tracing & Metrics tests (#25115)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 19:40:44 -07:00 |
|
Woosuk Kwon
|
2c3c1bd07a
|
[V0 Deprecation] Remove V0 Engine tests (#25114)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 19:38:09 -07:00 |
|
bnellnm
|
5963b98b46
|
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-17 17:43:31 -06:00 |
|
elvischenv
|
e6585ddb45
|
[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel (#24833)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-17 16:37:23 -07:00 |
|
Karan Goel
|
2a4d6412e6
|
Add a batched auto tune script (#25076)
Signed-off-by: Karan Goel <karangoel@google.com>
Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-17 22:41:18 +00:00 |
|
elvischenv
|
e67a79db03
|
[Bugfix] Refactor Flashinfer TRTLLM attention kernel selection logic (#24600)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-17 15:36:29 -07:00 |
|
Michael Goin
|
9f882d8791
|
Disable failing GPT-OSS Eval (Blackwell) for now (#25107)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-17 15:36:00 -07:00 |
|
Douglas Lehr
|
1a456c7c90
|
Aiter mha fp8 fix (#24991)
Signed-off-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
|
2025-09-17 22:29:14 +00:00 |
|
Alexander Matveev
|
fedb75fa27
|
[Bugfix][B200] Fix cutlass_mla hang (#24966)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-17 18:06:38 -04:00 |
|
Andrew Xia
|
bff2e5f1d6
|
[gpt-oss][2] fix types for streaming (#24556)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2025-09-17 22:04:28 +00:00 |
|
czhu-cohere
|
3c068c637b
|
[Kernel] Faster pre-processing time for W4A8 (#23972)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-09-17 14:35:32 -07:00 |
|
ahao-anyscale
|
f20c3b0951
|
[BUG] Exclude .pth files when pulling remote files (#25092)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-09-17 20:42:09 +00:00 |
|
Mohammad Miadh Angkad
|
883131544f
|
[Bugfix] Update import path for bc_linter_include (#24766)
Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu>
|
2025-09-17 20:33:11 +00:00 |
|
Yihua Cheng
|
ee5fd49150
|
[Misc] Update owners for KV connector and V1 offloading (#25041)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
|
2025-09-17 12:37:29 -07:00 |
|
afeldman-nm
|
7ae9887542
|
[V1] Logits processor docs (#22919)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Joseph Marinier <Joseph.Marinier@gmail.com>
|
2025-09-17 11:53:12 -07:00 |
|
Michael Goin
|
e3db5ebb66
|
[CI Bugfix] Fix failing test_model_load_with_params tests due to tokenizer refactor (#25086)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-17 11:15:05 -07:00 |
|
Woosuk Kwon
|
9d442b7c48
|
[V0 Deprecation] Remove V0 tests in test_sequence.py (#25088)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 11:08:45 -07:00 |
|
Woosuk Kwon
|
eb68c2dcd9
|
[CI] Revert back prepare_prompts and check_answers (#25087)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 11:03:16 -07:00 |
|
Michael Goin
|
8b32464ac1
|
Change log level from info to debug for IOProcessor (#24999)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-17 10:21:28 -07:00 |
|
Woosuk Kwon
|
99cc41ad50
|
[V0 Deprecation] Remove unused output processor util (#25023)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-17 09:50:07 -07:00 |
|
Simon Mo
|
d6a518fdde
|
Remove unused find_cuda_init helper script (#25044)
|
2025-09-17 09:47:40 -07:00 |
|
Simon Mo
|
4aa8c7b047
|
cleanup: remove adapter commons (#25045)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-17 16:46:29 +00:00 |
|
Woosuk Kwon
|
4b946d693e
|
[V0 Deprecation] Remove V0 Core tests (#25082)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 09:32:42 -07:00 |
|
Michael Goin
|
087c6ffc92
|
[CI Bugfix] Fix failing test_invalid_env (#25078)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-17 08:28:58 -07:00 |
|
samzong
|
4a2d33e371
|
[Docs] vllm/benchmarks/datasets.py fix docstring param format. (#24970)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-09-17 08:11:51 -07:00 |
|
Matthew Bonanni
|
8f3616f422
|
Remove old cutlass mla (#23961)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-17 14:31:43 +00:00 |
|
samzong
|
47f670b03b
|
[Docs] improve code formatting and comments for eliminate griffe build warning. (#25010)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-09-17 07:31:20 -07:00 |
|
Tao He
|
dd6a910aac
|
[Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP implementation. (#24957)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
|
2025-09-17 21:59:09 +08:00 |
|
dolpm
|
1b962e2457
|
[fix] lora benchmarks pass no_lora_flag_cpu (#23774)
Signed-off-by: Dylan Maloy <34420038+dolpm@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-17 21:22:25 +08:00 |
|
Aidyn-A
|
bfe9380161
|
Apply fixes for CUDA 13 (#24599)
Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
|
2025-09-17 09:15:42 -04:00 |
|
Li, Jiang
|
9fccd04e30
|
[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check (#25046)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-17 05:54:02 -07:00 |
|
danielafrimi
|
252ada5559
|
Add RADIO Vision Encoder Support to vLLM (#24595)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster>
|
2025-09-17 05:53:30 -07:00 |
|
Cyrus Leung
|
e120533d7a
|
[Misc] Avoid use of deprecated AutoModelForVision2Seq (#25065)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-17 12:19:15 +00:00 |
|