Jonas M. Kübler
|
58c360d9be
|
[Bug] fix import and unit test (#25558)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
|
2025-09-24 10:17:59 +00:00 |
|
Roger Wang
|
42488dae69
|
[Bugfix] Fix dummy video number of frames calculation (#25553)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-24 09:47:30 +00:00 |
|
youkaichao
|
b67dece2d8
|
[misc] update the warning message (#25566)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-24 17:24:35 +08:00 |
|
Lucas Wilkinson
|
2338daffd3
|
[BugFix] Potential Fix for FA3 full-cudagraph IMA (#25490)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-24 02:04:04 -07:00 |
|
Woosuk Kwon
|
2e19a848d4
|
[V0 Deprecation] Remove max_seq_len_to_capture (#25543)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-24 01:51:39 -07:00 |
|
Jackmin801
|
77a7fce1bb
|
[CI/Build] add nightly prime-rl integration tests (#25207)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-24 08:44:22 +00:00 |
|
Cyrus Leung
|
6488f3481b
|
[Misc]] Move processing context to multimodal directory (#25548)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-24 08:15:00 +00:00 |
|
Isotr0py
|
27ec3c78f3
|
[CI/Build] Fix v1 OOT registration test (#25547)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-24 08:03:13 +00:00 |
|
Li, Jiang
|
1cbcfb94de
|
[Bugfix][CPU] Skip unsupported custom op register on CPU (#25534)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-24 06:21:51 +00:00 |
|
Cyrus Leung
|
fed8a9b107
|
[Misc] Retry HF processing if "Already borrowed" error occurs (#25535)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-23 22:32:11 -07:00 |
|
Chengji Yao
|
190c45a6af
|
[TPU][Bugfix] fix the missing apply_model in tpu worker (#25526)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-09-24 05:18:08 +00:00 |
|
Ben Browning
|
5caaeb714c
|
[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls (#25514)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2025-09-24 03:20:38 +00:00 |
|
Corey Lowman
|
d747c2ef18
|
[Perf] Fix jit compiles at runtime of fla gated delta rule (#25432)
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-24 11:16:13 +08:00 |
|
Benjamin Chislett
|
c30b405b8f
|
[Spec Decode] Enable FlashInfer Spec Decoding (#25196)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: lhsjohn <huashuoli@tencent.com>
|
2025-09-23 22:29:58 -04:00 |
|
Yong Hoon Shin
|
77d906995c
|
[KV sharing] Re-land Gemma3n model changes from #22628 (#24357)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-09-23 19:25:34 -07:00 |
|
Nikhil Gupta
|
359d293006
|
[fix]: add Arm 4bit fused moe support (#23809)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
|
2025-09-24 01:32:22 +00:00 |
|
Lucas Wilkinson
|
9df8da548e
|
[BugFix] Fix MLA assert with CUTLASS MLA (#25478)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-23 21:09:43 -04:00 |
|
Wentao Ye
|
bf68fd76a9
|
[Compile] Fix AMD Compile Error (#25518)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-24 00:42:48 +00:00 |
|
Kyle Sayers
|
de94289a98
|
[Core] Support weight_loader_v2 for UnquantizedLinearMethod (#23036)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-09-23 18:30:26 -06:00 |
|
Benjamin Chislett
|
1983609239
|
[Bugfix] Use a separate FlashInfer workspace buffer for trtllm-gen (#25520)
|
2025-09-24 00:19:56 +00:00 |
|
baxingpiaochong
|
d06b5a95cb
|
[V1][Metrics] Add per-request TPOT histogram (#24015)
Signed-off-by: baxingpiaochong <771405853@qq.com>
|
2025-09-23 18:19:04 -06:00 |
|
0xNullPath
|
be0bb568c9
|
[Model] Support SeedOss Reason Parser (#24263)
Signed-off-by: Yan Lu <luyan@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 18:15:51 -06:00 |
|
ahao-anyscale
|
c8bde93367
|
[BUG] Allows for RunAI Streamer and Torch.compile cache to be used together (#24922)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-09-23 18:13:32 -06:00 |
|
Wentao Ye
|
88d7bdbd23
|
[Bug] Fix AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv' (#25519)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-24 00:07:51 +00:00 |
|
Chenxi Yang
|
0d235b874a
|
Add CUTLASS FP8 MOE benchmark scripts and kernel config (#25302)
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
|
2025-09-23 18:07:42 -06:00 |
|
Doug Smith
|
7ad5e50adf
|
Improve output when failing json.loads() on structured output test (#25483)
Signed-off-by: dougbtv <dosmith@redhat.com>
|
2025-09-23 18:03:31 -06:00 |
|
Lucas Wilkinson
|
dc464a3d39
|
[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch (#25505)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-23 18:00:29 -06:00 |
|
Alexander Matveev
|
1210e4d95b
|
[Bugfix] [B200] cutlass_mla - ensure kv_split == 1 for batch size > 1 (#25509)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-09-23 16:57:55 -07:00 |
|
Lucas Wilkinson
|
e0b24ea030
|
[Perf] Increase default max splits for FA3 full cudagraphs (#25495)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-23 16:53:34 -07:00 |
|
Juan Villamizar
|
bde2a1a8a4
|
[ROCm] Small functional changes for gptoss (#25201)
Signed-off-by: jpvillam <jpvillam@amd.com>
Co-authored-by: jpvillam <jpvillam@amd.com>
|
2025-09-23 23:39:50 +00:00 |
|
Thomas Parnell
|
5e25b12236
|
[Kernel] [Mamba] Remove BLOCK_H=1 from list of tuneable configurations for _chunk_cumsum_fwd_kernel (#25197)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
|
2025-09-23 23:23:30 +00:00 |
|
Corey Lowman
|
c85d75cf08
|
Add VLLM_NVTX_SCOPES_FOR_PROFILING=1 to enable nvtx.annotate scopes (#25501)
Signed-off-by: Corey Lowman <clowman1993@gmail.com>
|
2025-09-23 22:50:09 +00:00 |
|
kourosh hakhamaneshi
|
abad204be6
|
[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting (#25359)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-09-23 15:49:09 -07:00 |
|
Michael Goin
|
7361ab379f
|
Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-23 22:48:40 +00:00 |
|
Andrew Xia
|
95bc60e4cb
|
[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI (#25428)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2025-09-23 15:46:46 -07:00 |
|
Michael Goin
|
4f2954f724
|
Fix triton_reshape_and_cache_flash.py triton import (#25522)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-23 15:26:10 -07:00 |
|
rouchenzi
|
eca7be9077
|
Add VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE & VLLM_ENABLE_INDUCTOR_COORDINA… (#25493)
Signed-off-by: rouchenzi <ruochenwen@gmail.com>
Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com>
|
2025-09-23 22:17:49 +00:00 |
|
Thomas Parnell
|
969b4da3a6
|
[V0 Deprecation] Remove placeholder attn (#25510)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-09-23 22:12:14 +00:00 |
|
Jialin Ouyang
|
4f8c4b890a
|
[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-09-23 15:11:14 -07:00 |
|
Isotr0py
|
ae002924e9
|
[CI/Build] Fix and re-enable v1 PP test on CI (#25496)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-23 21:58:25 +00:00 |
|
Gregory Shtrasberg
|
690f948e4a
|
[Bugfix] Fix for the import error from #24588 (#25481)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-23 21:31:08 +00:00 |
|
Chauncey
|
08275ec0a2
|
[Build] Update Xgrammar to 0.1.25 (#25467)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-23 21:25:46 +00:00 |
|
Alec S
|
c828d1bf98
|
[Bugfix] gpt-oss container tool output bug (#25485)
Signed-off-by: Alec Solder <alecs@fb.com>
Co-authored-by: Alec Solder <alecs@fb.com>
|
2025-09-23 20:43:45 +00:00 |
|
Wentao Ye
|
8b8a8afc89
|
[CI] Fix Pre-commit Issue (#25497)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-24 04:09:37 +08:00 |
|
Ilya Markov
|
8bdd8b5c51
|
Enable symmetric memory all reduce by default only enabling for TP (#25070)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 15:53:00 -04:00 |
|
Michael Goin
|
a8ffc4f0f2
|
[Bugfix] Lower gpt-oss max cudagraph size to 992 to be compatible with FA3 (#25508)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-23 12:49:55 -07:00 |
|
jiahanc
|
d5944d5146
|
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-09-23 15:44:35 -04:00 |
|
Michael Goin
|
24fab45d96
|
[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE (#25444)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-23 15:29:26 -04:00 |
|
ElizaWszola
|
63400259d0
|
[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-09-23 12:03:10 -07:00 |
|
Amir Samani
|
8c1c81a3de
|
[core] add nccl symmetric memory for all reduce (#24532)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 14:33:06 -04:00 |
|