yitingdc
e82e3b55f6
[CI/Build] fix doc build warning: Failed to get 'name: description' pair ( #25733 )
...
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Eugene Khvedchenya
9e6628ccfc
EVS Support (Video tokens pruning) ( #22980 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Nick Hill
6ada221271
[Misc] Remove unnecessary memoryviews in shm_broadcast.py ( #25721 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Russell Bryant
ef160aa08e
[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder ( #25701 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Aleksandr Malyshev
c064c82674
Llamas 3.1 405B fp4 changes upstreaming from 355_wip ( #25135 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Nick Hill
6f97de4e47
[Misc] Don't log shm dequeue delay warning on worker side ( #25720 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Wentao Ye
3a32aa8a6b
[Refactor] Remove DeepGEMM OP Register ( #25710 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Shu Wang
1d21080118
Fix routing_bias dtype ( #25711 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
tomeras91
1d1436c3f7
[Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 ( #25708 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Zhuohan Li
37d836081a
[Core] Enable command line logging for LLMEngine ( #25610 )
...
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Ekagra Ranjan
f3a478b55e
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. ( #24986 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
b558c3a8b7
[Optimization] Use a cheaper cache key in get_model_architecture ( #25682 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
745b204ddc
[Optimization] Streamline InputPreprocessor ( #25702 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
b0e9f04bbd
[Misc] Simplify test_argsort_mm_positions ( #25690 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Jee Jee Li
80385959af
[V0 deprecation] Clean up LoRA ( #25686 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Matthew Bonanni
a355561291
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names ( #25489 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Isotr0py
9659b7e78f
[V0 deprecation] Clean up V0 fallback in compilation config ( #25675 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
34e6a31e40
[Model] Define merge_by_field_config MM interface ( #25676 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Tao Hui
c7ca3c5d2f
[Model] Add optional parameter to reasoning parser constructor ( #25554 )
...
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Lucas Wilkinson
fe6357a780
[BugFix] Fix DBO hang ( #25625 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Michael Goin
0cee734ab4
Revert "[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super()" ( #25681 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
AlonKejzman
252a0ff8c3
[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… ( #24662 )
...
Signed-off-by: AlonKejzman <alonkeizman@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Tyler Michael Smith
2655d7ab83
[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning ( #25532 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Nicolò Lucchesi
91d4299774
[Misc] Remove cruft file in repo ( #25678 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Isotr0py
f7f76a8668
[Bugfix] Fix InternS1 video processing after Transformers v4.56 ( #25644 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Russell Bryant
054c8b526f
[ux] Switch a warning to debug about a pytorch fallback ( #23750 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Li, Jiang
2469b8291b
[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata ( #25652 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Jonas M. Kübler
18c20257bf
[torch.compile] Make Query Quantization Fusable ( #24914 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
youkaichao
a5fa821b96
[misc] log info messages by default for hanging / busy / idle ( #25627 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
af10a37c6c
[mypy] Fix wrong type annotations related to tuple ( #25660 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
chenlang
a88371f84e
[Hardware][RISC-V] Add riscv64 support for vLLM with scalar ( #22112 )
...
Signed-off-by: chenlang <chen.lang5@zte.com.cn>
Co-authored-by: chenlang <10346245@zte.com.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Kunshang Ji
d7f6489f50
[XPU][Triton]add xpu config in triton_reshape_and_cache_flash ( #25643 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
222411313d
[CI/Build] Fix flaky entrypoints test ( #25663 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
22114ffebb
Add backward compatibility for guided_... API ( #25615 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
wang.yuqi
f3d9099b44
[V0 deprecation] Remove unreachable model_config.supported_tasks ( #25642 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
yyzxw
3d940e2c3f
[Bugfix] Parse SpeculativeConfig Error ( #25142 )
...
Signed-off-by: zxw <1020938856@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
686cfd91e3
[mypy] Further improve MM type annotations ( #25654 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Isotr0py
f17d37b006
[Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video profiling ( #25648 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Agata Dobrzyniewicz
034c0152db
[Bugfix] Add triton.language.tensor placeholder ( #25649 )
...
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Roger Wang
fd28c58825
[Misc] Fix Qwen3-VL video_grid_thw typing ( #25646 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Fadi Arafeh
5e16b8c552
[fix] Update torch version in cpu-build.txt for AArch64/ppc64le and Darwin ( #25579 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Tyler Michael Smith
6c6e553644
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… ( #25607 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Nicole LiHui 🥜
6a437a4178
typo: remove duplicate is ( #25641 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Jacob Kahn
004eed39ff
Map CwmForCausalLM to llama and LlamaForCausalLM ( #25611 )
...
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Cyrus Leung
8b17d2554c
[Misc] Simplify PoolerOutput and move to v1/outputs ( #25629 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
courage17340
94b78f576c
[Bugfix] fix apply_temperature to avoid nan in probs ( #24734 )
...
Signed-off-by: courage17340 <courage17340@163.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Nicole LiHui 🥜
d8ffa3c5f4
optimize: eliminate duplicate split_enc_dec_inputs calls ( #25573 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
XuruiYang
c26e7b14d7
[Model] Add LongCat-Flash ( #23991 )
...
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Saman A. Pour
12c21d28c1
Enable Fbgemm NVFP4 on Dense models ( #25609 )
...
Signed-off-by: Saman Keon <samanamp@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Wentao Ye
517a857166
[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() ( #25613 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00