Michael Goin
|
dc191cc5d9
|
[CI] Fix FlashInfer AOT in release docker image (#25730)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:56 -07:00 |
|
fhl2000
|
ceb346015c
|
[V1] address post issues related to #20059 (part 1) (#23046)
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:56 -07:00 |
|
Michael Goin
|
b6f16d37b0
|
[CI] Add E2E Blackwell Quantized MoE Test (#25723)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:56 -07:00 |
|
Michael Goin
|
5157781987
|
[Docs] Add Toronto Meetup (#25773)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:56 -07:00 |
|
Frank Wang
|
f16c440c9f
|
[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition (#25355)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:56 -07:00 |
|
Clouddude
|
8c1b61bd77
|
[Doc]: improve CPU(x86) build-wheel-from-source section (#25617)
Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
阿丹(adan)
|
e0175fbf01
|
Eagle3 that supports the Minicpm3 model (#24243)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: liudan <adan@minicpm.com>
Co-authored-by: liudan <liudan@qq.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Jiangyun Zhu
|
c72298213d
|
[Misc] fix unique_filepath (#25732)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Seiji Eicher
|
41174e2803
|
[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray (#25439)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Lucas Wilkinson
|
6ca8d9753c
|
[BugFix] Fix using dbo_decode_token_threshold always (and ignoring dbo_prefill_token_threshold) (#25622)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Isotr0py
|
d70c154975
|
[Quantization] Add field to skip unquantized modules for GPTQ config (#25455)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Cyrus Leung
|
129a643b4c
|
[CI/Build] Fix some V1 tests not being run (#25569)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Cyrus Leung
|
d3c732e985
|
[CI/Build] Split up Distributed Tests (#25572)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
wang.yuqi
|
fb0eece290
|
[Bugfix] Properly abort pooling request. (#25734)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Chauncey
|
515e30b023
|
[CI] Fix test_shared_storage_connector_hashes (#25748)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Chih-Chieh Yang
|
62ae26c870
|
[Model] Mamba2 varlen refactor (#21467)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Cyrus Leung
|
87ee8535a6
|
[Doc] Update Batch-level DP docs (#25757)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Xu Wenqing
|
ced693e845
|
Support LongCat-Flash-Chat tool call (#24083)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Sage Moore
|
fa55373af1
|
[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk (#25698)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
yyzxw
|
c761b84d5f
|
[misc] refactor speculative config (#25657)
Signed-off-by: zxw <1020938856@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Icey
|
bc37468b3c
|
Remove cuda hard-code in compute_causal_conv1d_metadata (#25555)
Signed-off-by: Icey <1790571317@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Tao He
|
067fe8b10e
|
[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. (#25743)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Iceber Gu
|
0aea9348cc
|
fix: print outputt offline_inference/base/chat.py example (#25744)
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
xaguilar-amd
|
79586c5449
|
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300X (#25703)
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Andrew Sansom
|
b2d5d42337
|
perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled (#25739)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Andrew Sansom
|
74ea69f413
|
fix: revert cast to cpu in MsgpackEncoder._encode_tensor to avoid hidden performance regressions (#25738)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
yitingdc
|
e82e3b55f6
|
[CI/Build] fix doc build warning: Failed to get 'name: description' pair (#25733)
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Eugene Khvedchenya
|
9e6628ccfc
|
EVS Support (Video tokens pruning) (#22980)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Nick Hill
|
6ada221271
|
[Misc] Remove unnecessary memoryviews in shm_broadcast.py (#25721)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Russell Bryant
|
ef160aa08e
|
[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder (#25701)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Aleksandr Malyshev
|
c064c82674
|
Llamas 3.1 405B fp4 changes upstreaming from 355_wip (#25135)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Nick Hill
|
6f97de4e47
|
[Misc] Don't log shm dequeue delay warning on worker side (#25720)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Wentao Ye
|
3a32aa8a6b
|
[Refactor] Remove DeepGEMM OP Register (#25710)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Shu Wang
|
1d21080118
|
Fix routing_bias dtype (#25711)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
tomeras91
|
1d1436c3f7
|
[Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 (#25708)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Zhuohan Li
|
37d836081a
|
[Core] Enable command line logging for LLMEngine (#25610)
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Ekagra Ranjan
|
f3a478b55e
|
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Cyrus Leung
|
b558c3a8b7
|
[Optimization] Use a cheaper cache key in get_model_architecture (#25682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Cyrus Leung
|
745b204ddc
|
[Optimization] Streamline InputPreprocessor (#25702)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Cyrus Leung
|
b0e9f04bbd
|
[Misc] Simplify test_argsort_mm_positions (#25690)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Jee Jee Li
|
80385959af
|
[V0 deprecation] Clean up LoRA (#25686)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Matthew Bonanni
|
a355561291
|
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Isotr0py
|
9659b7e78f
|
[V0 deprecation] Clean up V0 fallback in compilation config (#25675)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Cyrus Leung
|
34e6a31e40
|
[Model] Define merge_by_field_config MM interface (#25676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Tao Hui
|
c7ca3c5d2f
|
[Model] Add optional parameter to reasoning parser constructor (#25554)
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Lucas Wilkinson
|
fe6357a780
|
[BugFix] Fix DBO hang (#25625)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Michael Goin
|
0cee734ab4
|
Revert "[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super()" (#25681)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
AlonKejzman
|
252a0ff8c3
|
[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… (#24662)
Signed-off-by: AlonKejzman <alonkeizman@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Tyler Michael Smith
|
2655d7ab83
|
[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning (#25532)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|
Nicolò Lucchesi
|
91d4299774
|
[Misc] Remove cruft file in repo (#25678)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:55 -07:00 |
|