Wei Wei
b839194931
[Kernel] Support DCP for Triton backend ( #25132 )
...
Signed-off-by: Wei Wei <wwei6@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Jee Jee Li
1d6f767dc4
[Model] Improve DotsOCRForCausalLM ( #25466 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Shiyan Deng
b95429c920
[MISC] replace c10::optional with std::optional ( #25602 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
7319686692
Improve --help for enhanced user experience ( #24903 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Wentao Ye
b3fd4ed80c
[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor ( #25517 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Duncan Moss
461aa1463b
feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel ( #25503 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Tyler Michael Smith
b4a80dad98
[Logging] Improve log for when DeepEP HT disables CUDA Graphs ( #25531 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Woosuk Kwon
61a6443bc3
[V0 Deprecation] Remove unused classes in attention ( #25541 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
yewentao256
c8071faa5d
fix compile error
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
46ed215d6b
[Docs] Enable fail_on_warning for the docs build in CI ( #25580 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Sage Moore
0e0d51c9c6
Suppress benign cuBLAS warning when capturing cudagraphs with DBO ( #25596 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Shu Wang
72a5101c7a
Support mnnvl all2allv from Flashinfer ( #21003 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
rongfu.leng
7d9f44ad2a
[Bugfix] add cache model when from object storage get model ( #24764 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Michael Goin
984bfb4ba7
Fixes and updates to bench_per_token_quant_fp8 ( #25591 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Gregory Shtrasberg
b1f9a1f46a
[ROCm][Build][Bugfix] Fix ROCm base docker whls installation order ( #25415 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Gregory Shtrasberg
3331ced61b
[ROCm][Bugfix] Only enable +rms_norm based on aiter if not explicitly disabled ( #25275 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
b614e0f82b
[Misc] Improve type annotations for jsontree ( #25577 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
44d6701f70
Move DeviceConfig, ObservabilityConfig, SpeechToTextConfig to their own files ( #25564 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Tao Hui
71566e8afc
[Bugfix] Fix DeepSeekV31ToolParser to correctly parse multiple tools in non-streaming output ( #25405 )
...
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Peter Pan
88d8c72d5f
[docs] fix nixl kv_connector_extra_config.backends key ( #25565 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Russell Bryant
0cb913b0a2
[Benchmark] Fix regression in structured output benchmark ( #25500 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Jonas M. Kübler
f98d4d38c0
[Bug] fix import and unit test ( #25558 )
...
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Roger Wang
d5c0f43b86
[Bugfix] Fix dummy video number of frames calculation ( #25553 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
youkaichao
54174c67f8
[misc] update the warning message ( #25566 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Lucas Wilkinson
d1e2d17b57
[BugFix] Potential Fix for FA3 full-cudagraph IMA ( #25490 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Woosuk Kwon
9914857f2b
[V0 Deprecation] Remove max_seq_len_to_capture ( #25543 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Jackmin801
7441d07360
[CI/Build] add nightly prime-rl integration tests ( #25207 )
...
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Cyrus Leung
4ca175ea0b
[Misc]] Move processing context to multimodal directory ( #25548 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Isotr0py
c39befcead
[CI/Build] Fix v1 OOT registration test ( #25547 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Li, Jiang
c8ef8a50d2
[Bugfix][CPU] Skip unsupported custom op register on CPU ( #25534 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Cyrus Leung
fc90ce79f0
[Misc] Retry HF processing if "Already borrowed" error occurs ( #25535 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Chengji Yao
5b4ba2e1e1
[TPU][Bugfix] fix the missing apply_model in tpu worker ( #25526 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Ben Browning
d7fb5a4ae8
[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls ( #25514 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Corey Lowman
f52b991db6
[Perf] Fix jit compiles at runtime of fla gated delta rule ( #25432 )
...
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Benjamin Chislett
177c37e960
[Spec Decode] Enable FlashInfer Spec Decoding ( #25196 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: lhsjohn <huashuoli@tencent.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Yong Hoon Shin
0e54bbe108
[KV sharing] Re-land Gemma3n model changes from #22628 ( #24357 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Nikhil Gupta
6b87ce2ecd
[fix]: add Arm 4bit fused moe support ( #23809 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Lucas Wilkinson
a986f17028
[BugFix] Fix MLA assert with CUTLASS MLA ( #25478 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Wentao Ye
faa58fa791
[Compile] Fix AMD Compile Error ( #25518 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Kyle Sayers
4ed6b67da3
[Core] Support weight_loader_v2 for UnquantizedLinearMethod ( #23036 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Benjamin Chislett
cb825af948
[Bugfix] Use a separate FlashInfer workspace buffer for trtllm-gen ( #25520 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
baxingpiaochong
342d17fb7f
[V1][Metrics] Add per-request TPOT histogram ( #24015 )
...
Signed-off-by: baxingpiaochong <771405853@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
0xNullPath
3c62d28bb9
[Model] Support SeedOss Reason Parser ( #24263 )
...
Signed-off-by: Yan Lu <luyan@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
ahao-anyscale
9596fbd6e5
[BUG] Allows for RunAI Streamer and Torch.compile cache to be used together ( #24922 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Wentao Ye
03585bc79d
[Bug] Fix AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv' ( #25519 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Chenxi Yang
770cb2e1f8
Add CUTLASS FP8 MOE benchmark scripts and kernel config ( #25302 )
...
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Doug Smith
b50fa00537
Improve output when failing json.loads() on structured output test ( #25483 )
...
Signed-off-by: dougbtv <dosmith@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Lucas Wilkinson
8e6a5e7dd4
[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch ( #25505 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Alexander Matveev
faae7a7eab
[Bugfix] [B200] cutlass_mla - ensure kv_split == 1 for batch size > 1 ( #25509 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Lucas Wilkinson
d562c2ea09
[Perf] Increase default max splits for FA3 full cudagraphs ( #25495 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00