9991 Commits

Author SHA1 Message Date
Isotr0py
f7f76a8668 [Bugfix] Fix InternS1 video processing after Transformers v4.56 (#25644)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Russell Bryant
054c8b526f [ux] Switch a warning to debug about a pytorch fallback (#23750)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Li, Jiang
2469b8291b [CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata (#25652)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Jonas M. Kübler
18c20257bf [torch.compile] Make Query Quantization Fusable (#24914)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
youkaichao
a5fa821b96 [misc] log info messages by default for hanging / busy / idle (#25627)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
af10a37c6c [mypy] Fix wrong type annotations related to tuple (#25660)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
chenlang
a88371f84e [Hardware][RISC-V] Add riscv64 support for vLLM with scalar (#22112)
Signed-off-by: chenlang <chen.lang5@zte.com.cn>
Co-authored-by: chenlang <10346245@zte.com.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Kunshang Ji
d7f6489f50 [XPU][Triton]add xpu config in triton_reshape_and_cache_flash (#25643)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
222411313d [CI/Build] Fix flaky entrypoints test (#25663)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
22114ffebb Add backward compatibility for guided_... API (#25615)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
wang.yuqi
f3d9099b44 [V0 deprecation] Remove unreachable model_config.supported_tasks (#25642)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
yyzxw
3d940e2c3f [Bugfix] Parse SpeculativeConfig Error (#25142)
Signed-off-by: zxw <1020938856@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
686cfd91e3 [mypy] Further improve MM type annotations (#25654)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Isotr0py
f17d37b006 [Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video profiling (#25648)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Agata Dobrzyniewicz
034c0152db [Bugfix] Add triton.language.tensor placeholder (#25649)
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Roger Wang
fd28c58825 [Misc] Fix Qwen3-VL video_grid_thw typing (#25646)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Fadi Arafeh
5e16b8c552 [fix] Update torch version in cpu-build.txt for AArch64/ppc64le and Darwin (#25579)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Tyler Michael Smith
6c6e553644 Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Nicole LiHui 🥜
6a437a4178 typo: remove duplicate is (#25641)
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Jacob Kahn
004eed39ff Map CwmForCausalLM to llama and LlamaForCausalLM (#25611)
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
8b17d2554c [Misc] Simplify PoolerOutput and move to v1/outputs (#25629)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
courage17340
94b78f576c [Bugfix] fix apply_temperature to avoid nan in probs (#24734)
Signed-off-by: courage17340 <courage17340@163.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Nicole LiHui 🥜
d8ffa3c5f4 optimize: eliminate duplicate split_enc_dec_inputs calls (#25573)
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
XuruiYang
c26e7b14d7 [Model] Add LongCat-Flash (#23991)
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Saman A. Pour
12c21d28c1 Enable Fbgemm NVFP4 on Dense models (#25609)
Signed-off-by: Saman Keon <samanamp@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Wentao Ye
517a857166 [Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() (#25613)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Wei Wei
b839194931 [Kernel] Support DCP for Triton backend (#25132)
Signed-off-by: Wei Wei <wwei6@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Jee Jee Li
1d6f767dc4 [Model] Improve DotsOCRForCausalLM (#25466)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Shiyan Deng
b95429c920 [MISC] replace c10::optional with std::optional (#25602)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
7319686692 Improve --help for enhanced user experience (#24903)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Wentao Ye
b3fd4ed80c [Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Duncan Moss
461aa1463b feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel (#25503)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Tyler Michael Smith
b4a80dad98 [Logging] Improve log for when DeepEP HT disables CUDA Graphs (#25531)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Woosuk Kwon
61a6443bc3 [V0 Deprecation] Remove unused classes in attention (#25541)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
yewentao256
c8071faa5d fix compile error
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
46ed215d6b [Docs] Enable fail_on_warning for the docs build in CI (#25580)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Sage Moore
0e0d51c9c6 Suppress benign cuBLAS warning when capturing cudagraphs with DBO (#25596)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Shu Wang
72a5101c7a Support mnnvl all2allv from Flashinfer (#21003)
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
rongfu.leng
7d9f44ad2a [Bugfix] add cache model when from object storage get model (#24764)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Michael Goin
984bfb4ba7 Fixes and updates to bench_per_token_quant_fp8 (#25591)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Gregory Shtrasberg
b1f9a1f46a [ROCm][Build][Bugfix] Fix ROCm base docker whls installation order (#25415)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Gregory Shtrasberg
3331ced61b [ROCm][Bugfix] Only enable +rms_norm based on aiter if not explicitly disabled (#25275)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
b614e0f82b [Misc] Improve type annotations for jsontree (#25577)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
44d6701f70 Move DeviceConfig, ObservabilityConfig, SpeechToTextConfig to their own files (#25564)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Tao Hui
71566e8afc [Bugfix] Fix DeepSeekV31ToolParser to correctly parse multiple tools in non-streaming output (#25405)
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Peter Pan
88d8c72d5f [docs] fix nixl kv_connector_extra_config.backends key (#25565)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Russell Bryant
0cb913b0a2 [Benchmark] Fix regression in structured output benchmark (#25500)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Jonas M. Kübler
f98d4d38c0 [Bug] fix import and unit test (#25558)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Roger Wang
d5c0f43b86 [Bugfix] Fix dummy video number of frames calculation (#25553)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
youkaichao
54174c67f8 [misc] update the warning message (#25566)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00