Jonas M. Kübler
f98d4d38c0
[Bug] fix import and unit test ( #25558 )
...
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Roger Wang
d5c0f43b86
[Bugfix] Fix dummy video number of frames calculation ( #25553 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
youkaichao
54174c67f8
[misc] update the warning message ( #25566 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Lucas Wilkinson
d1e2d17b57
[BugFix] Potential Fix for FA3 full-cudagraph IMA ( #25490 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Woosuk Kwon
9914857f2b
[V0 Deprecation] Remove max_seq_len_to_capture ( #25543 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Jackmin801
7441d07360
[CI/Build] add nightly prime-rl integration tests ( #25207 )
...
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Cyrus Leung
4ca175ea0b
[Misc]] Move processing context to multimodal directory ( #25548 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Isotr0py
c39befcead
[CI/Build] Fix v1 OOT registration test ( #25547 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Li, Jiang
c8ef8a50d2
[Bugfix][CPU] Skip unsupported custom op register on CPU ( #25534 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Cyrus Leung
fc90ce79f0
[Misc] Retry HF processing if "Already borrowed" error occurs ( #25535 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Chengji Yao
5b4ba2e1e1
[TPU][Bugfix] fix the missing apply_model in tpu worker ( #25526 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Ben Browning
d7fb5a4ae8
[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls ( #25514 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Corey Lowman
f52b991db6
[Perf] Fix jit compiles at runtime of fla gated delta rule ( #25432 )
...
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Benjamin Chislett
177c37e960
[Spec Decode] Enable FlashInfer Spec Decoding ( #25196 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: lhsjohn <huashuoli@tencent.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Yong Hoon Shin
0e54bbe108
[KV sharing] Re-land Gemma3n model changes from #22628 ( #24357 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Nikhil Gupta
6b87ce2ecd
[fix]: add Arm 4bit fused moe support ( #23809 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Lucas Wilkinson
a986f17028
[BugFix] Fix MLA assert with CUTLASS MLA ( #25478 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Wentao Ye
faa58fa791
[Compile] Fix AMD Compile Error ( #25518 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Kyle Sayers
4ed6b67da3
[Core] Support weight_loader_v2 for UnquantizedLinearMethod ( #23036 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Benjamin Chislett
cb825af948
[Bugfix] Use a separate FlashInfer workspace buffer for trtllm-gen ( #25520 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
baxingpiaochong
342d17fb7f
[V1][Metrics] Add per-request TPOT histogram ( #24015 )
...
Signed-off-by: baxingpiaochong <771405853@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
0xNullPath
3c62d28bb9
[Model] Support SeedOss Reason Parser ( #24263 )
...
Signed-off-by: Yan Lu <luyan@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
ahao-anyscale
9596fbd6e5
[BUG] Allows for RunAI Streamer and Torch.compile cache to be used together ( #24922 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Wentao Ye
03585bc79d
[Bug] Fix AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv' ( #25519 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Chenxi Yang
770cb2e1f8
Add CUTLASS FP8 MOE benchmark scripts and kernel config ( #25302 )
...
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Doug Smith
b50fa00537
Improve output when failing json.loads() on structured output test ( #25483 )
...
Signed-off-by: dougbtv <dosmith@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Lucas Wilkinson
8e6a5e7dd4
[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch ( #25505 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Alexander Matveev
faae7a7eab
[Bugfix] [B200] cutlass_mla - ensure kv_split == 1 for batch size > 1 ( #25509 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Lucas Wilkinson
d562c2ea09
[Perf] Increase default max splits for FA3 full cudagraphs ( #25495 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Juan Villamizar
81ee45298d
[ROCm] Small functional changes for gptoss ( #25201 )
...
Signed-off-by: jpvillam <jpvillam@amd.com>
Co-authored-by: jpvillam <jpvillam@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Thomas Parnell
d12433adfc
[Kernel] [Mamba] Remove BLOCK_H=1 from list of tuneable configurations for _chunk_cumsum_fwd_kernel ( #25197 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Corey Lowman
4ebc513fc1
Add VLLM_NVTX_SCOPES_FOR_PROFILING=1 to enable nvtx.annotate scopes ( #25501 )
...
Signed-off-by: Corey Lowman <clowman1993@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
kourosh hakhamaneshi
7a8f0a3548
[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting ( #25359 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Michael Goin
907bbca7b7
Remove redundant mutates_args and dispatch_key for direct_register_custom_op ( #25512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Andrew Xia
eb1f43bc82
[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI ( #25428 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Michael Goin
99eaeebe66
Fix triton_reshape_and_cache_flash.py triton import ( #25522 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
rouchenzi
715e24e1b3
Add VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE & VLLM_ENABLE_INDUCTOR_COORDINA… ( #25493 )
...
Signed-off-by: rouchenzi <ruochenwen@gmail.com>
Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Thomas Parnell
cf0e250200
[V0 Deprecation] Remove placeholder attn ( #25510 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Jialin Ouyang
0c11617ff1
[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] ( #24830 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Isotr0py
930e691c65
[CI/Build] Fix and re-enable v1 PP test on CI ( #25496 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Gregory Shtrasberg
c0f11557e1
[Bugfix] Fix for the import error from #24588 ( #25481 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Chauncey
0438c65376
[Build] Update Xgrammar to 0.1.25 ( #25467 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Alec S
d8fda7420a
[Bugfix] gpt-oss container tool output bug ( #25485 )
...
Signed-off-by: Alec Solder <alecs@fb.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Wentao Ye
86e5b73d71
[CI] Fix Pre-commit Issue ( #25497 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Ilya Markov
e49561cd91
Enable symmetric memory all reduce by default only enabling for TP ( #25070 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Michael Goin
0e30643147
[Bugfix] Lower gpt-oss max cudagraph size to 992 to be compatible with FA3 ( #25508 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
jiahanc
8ba3b17cc1
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue ( #25406 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Michael Goin
8222e2651d
[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE ( #25444 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
ElizaWszola
b672b8c3b8
[Performance] Move apply_w8a8_block_fp8_linear to an op class ( #24666 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Amir Samani
56201cfb01
[core] add nccl symmetric memory for all reduce ( #24532 )
...
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00