Agata Dobrzyniewicz
3c2b2ccece
[Bugfix] Add triton.language.tensor placeholder ( #25649 )
...
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
2025-09-25 10:31:14 +00:00
Roger Wang
7be9ffcd9f
[Misc] Fix Qwen3-VL video_grid_thw typing ( #25646 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-25 10:16:45 +00:00
Fadi Arafeh
393de22d2e
[fix] Update torch version in cpu-build.txt for AArch64/ppc64le and Darwin ( #25579 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-09-25 09:39:18 +00:00
Tyler Michael Smith
1260180c67
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… ( #25607 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-25 08:05:21 +00:00
Nicole LiHui 🥜
af4ee63e0e
typo: remove duplicate is ( #25641 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
2025-09-25 00:46:22 -07:00
Jacob Kahn
bc092ea873
Map CwmForCausalLM to llama and LlamaForCausalLM ( #25611 )
...
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-25 07:37:03 +00:00
Cyrus Leung
755ed7b05b
[Misc] Simplify PoolerOutput and move to v1/outputs ( #25629 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-25 06:47:03 +00:00
courage17340
a676e668ee
[Bugfix] fix apply_temperature to avoid nan in probs ( #24734 )
...
Signed-off-by: courage17340 <courage17340@163.com>
2025-09-25 05:32:21 +00:00
Nicole LiHui 🥜
c85be1f6dd
optimize: eliminate duplicate split_enc_dec_inputs calls ( #25573 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
2025-09-25 05:03:25 +00:00
XuruiYang
845adb3ec6
[Model] Add LongCat-Flash ( #23991 )
...
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
2025-09-24 21:53:40 -07:00
Saman A. Pour
90b139cfff
Enable Fbgemm NVFP4 on Dense models ( #25609 )
...
Signed-off-by: Saman Keon <samanamp@outlook.com>
2025-09-24 21:12:53 -07:00
Wentao Ye
4492e3a554
[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() ( #25613 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-24 18:52:52 -07:00
Wei Wei
05c19485a5
[Kernel] Support DCP for Triton backend ( #25132 )
...
Signed-off-by: Wei Wei <wwei6@meta.com>
2025-09-24 18:09:34 -07:00
Jee Jee Li
52d0cb8458
[Model] Improve DotsOCRForCausalLM ( #25466 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-25 07:58:08 +08:00
Shiyan Deng
5c1e496a75
[MISC] replace c10::optional with std::optional ( #25602 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
2025-09-24 16:56:21 -07:00
Harry Mellor
e7f27ea648
Improve --help for enhanced user experience ( #24903 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-24 23:08:18 +00:00
Wentao Ye
1f29141258
[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor ( #25517 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-24 18:52:36 -04:00
Duncan Moss
6160ba4151
feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel ( #25503 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
2025-09-24 18:50:04 -04:00
Tyler Michael Smith
fea8006062
[Logging] Improve log for when DeepEP HT disables CUDA Graphs ( #25531 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-09-24 22:43:06 +00:00
Woosuk Kwon
e6750d0b18
[V0 Deprecation] Remove unused classes in attention ( #25541 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-24 13:24:40 -07:00
Harry Mellor
8c853050e7
[Docs] Enable fail_on_warning for the docs build in CI ( #25580 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-24 19:30:33 +00:00
Sage Moore
f84a472a03
Suppress benign cuBLAS warning when capturing cudagraphs with DBO ( #25596 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-09-24 19:02:08 +00:00
Shu Wang
54e42b72db
Support mnnvl all2allv from Flashinfer ( #21003 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-24 14:38:16 -04:00
rongfu.leng
2dda3e35d0
[Bugfix] add cache model when from object storage get model ( #24764 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-09-24 18:11:16 +00:00
Michael Goin
d83f3f7cb3
Fixes and updates to bench_per_token_quant_fp8 ( #25591 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-09-24 08:30:15 -07:00
Gregory Shtrasberg
302eb941f3
[ROCm][Build][Bugfix] Fix ROCm base docker whls installation order ( #25415 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-24 11:25:10 -04:00
Gregory Shtrasberg
487745ff49
[ROCm][Bugfix] Only enable +rms_norm based on aiter if not explicitly disabled ( #25275 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-24 11:24:39 -04:00
Cyrus Leung
9313be5017
[Misc] Improve type annotations for jsontree ( #25577 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-24 22:49:58 +08:00
Harry Mellor
8938774c79
Move DeviceConfig, ObservabilityConfig, SpeechToTextConfig to their own files ( #25564 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-24 13:59:05 +00:00
Tao Hui
e18b714b2e
[Bugfix] Fix DeepSeekV31ToolParser to correctly parse multiple tools in non-streaming output ( #25405 )
...
Signed-off-by: taohui <taohui3@gmail.com>
2025-09-24 20:58:00 +08:00
Peter Pan
b1068903fd
[docs] fix nixl kv_connector_extra_config.backends key ( #25565 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-24 11:00:27 +00:00
Russell Bryant
164299500b
[Benchmark] Fix regression in structured output benchmark ( #25500 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-24 10:40:42 +00:00
Jonas M. Kübler
58c360d9be
[Bug] fix import and unit test ( #25558 )
...
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
2025-09-24 10:17:59 +00:00
Roger Wang
42488dae69
[Bugfix] Fix dummy video number of frames calculation ( #25553 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-24 09:47:30 +00:00
youkaichao
b67dece2d8
[misc] update the warning message ( #25566 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-24 17:24:35 +08:00
Lucas Wilkinson
2338daffd3
[BugFix] Potential Fix for FA3 full-cudagraph IMA ( #25490 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-24 02:04:04 -07:00
Woosuk Kwon
2e19a848d4
[V0 Deprecation] Remove max_seq_len_to_capture ( #25543 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-24 01:51:39 -07:00
Jackmin801
77a7fce1bb
[CI/Build] add nightly prime-rl integration tests ( #25207 )
...
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-24 08:44:22 +00:00
Cyrus Leung
6488f3481b
[Misc]] Move processing context to multimodal directory ( #25548 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-24 08:15:00 +00:00
Isotr0py
27ec3c78f3
[CI/Build] Fix v1 OOT registration test ( #25547 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-24 08:03:13 +00:00
Li, Jiang
1cbcfb94de
[Bugfix][CPU] Skip unsupported custom op register on CPU ( #25534 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-24 06:21:51 +00:00
Cyrus Leung
fed8a9b107
[Misc] Retry HF processing if "Already borrowed" error occurs ( #25535 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-23 22:32:11 -07:00
Chengji Yao
190c45a6af
[TPU][Bugfix] fix the missing apply_model in tpu worker ( #25526 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-09-24 05:18:08 +00:00
Ben Browning
5caaeb714c
[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls ( #25514 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-09-24 03:20:38 +00:00
Corey Lowman
d747c2ef18
[Perf] Fix jit compiles at runtime of fla gated delta rule ( #25432 )
...
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-24 11:16:13 +08:00
Benjamin Chislett
c30b405b8f
[Spec Decode] Enable FlashInfer Spec Decoding ( #25196 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: lhsjohn <huashuoli@tencent.com>
2025-09-23 22:29:58 -04:00
Yong Hoon Shin
77d906995c
[KV sharing] Re-land Gemma3n model changes from #22628 ( #24357 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-09-23 19:25:34 -07:00
Nikhil Gupta
359d293006
[fix]: add Arm 4bit fused moe support ( #23809 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
2025-09-24 01:32:22 +00:00
Lucas Wilkinson
9df8da548e
[BugFix] Fix MLA assert with CUTLASS MLA ( #25478 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-23 21:09:43 -04:00
Wentao Ye
bf68fd76a9
[Compile] Fix AMD Compile Error ( #25518 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-24 00:42:48 +00:00