Nick Hill
45c0526ac9
[BugFix] Handle errors when preprocessing added requests ( #30895 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-19 01:29:11 +00:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-18 19:46:28 +00:00
Andrey Talman
e06d0bf0aa
2.9.1 PyTorch release update ( #28495 )
2025-12-17 12:20:22 -08:00
Chauncey
9ad5b21710
[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory ( #30749 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-17 02:27:30 -08:00
Michael Goin
10ee1c64cf
[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test ( #30723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-16 14:28:34 -05:00
Lucas Wilkinson
00a8d7628c
[BugFix] Fix memory spike in workspace allocation ( #30744 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-16 06:46:22 -08:00
Cyrus Leung
ed586e7724
[Refactor] [3/N] Move tool parser tests and run on CPU ( #30693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-15 13:45:36 +00:00
Michael Goin
2f32a68d75
[CI] Update several models in registry that are available online now ( #30514 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-12-12 18:28:13 -08:00
Kevin H. Luu
b4039c08b5
[ci] Mark PrimeRL integration test as soft fail ( #30578 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-12-12 14:13:09 -08:00
shivampr
cd7740ac5c
[ROCm] Enable Triton ScaledMM fallback + kernel selection fix ( #26668 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com>
Signed-off-by: Shivam <shivamprasad91@gmail.com>
2025-12-12 13:28:20 -05:00
Sage Moore
b4054c8ab4
Revert "[CI] Add Async Eplb nightly CI tests ( #29385 )" ( #30431 )
2025-12-11 00:48:35 +00:00
Ilya Markov
0b6a8a304c
[BugFix] Fix non detected failing tests ( #30277 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2025-12-09 17:57:55 +00:00
Zhewen Li
263c38d74d
[CI/Build] Update batch invariant test trigger ( #30080 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-12-05 00:42:37 +00:00
Zhewen Li
c493b9d092
[CI/Build] Add MM code path to Examples Test ( #29986 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-12-03 19:21:45 -08:00
WeiQing Chen
7fe9c1a223
[CI] Add Async Eplb nightly CI tests ( #29385 )
...
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-03 09:51:08 +00:00
wang.yuqi
2eb4fe9129
[examples] Resettle pooling examples. ( #29365 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 15:54:28 +00:00
Shengqi Chen
4b612664fd
[CI] Renovation of nightly wheel build & generation (take 2) ( #29838 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-01 22:17:10 -08:00
Kevin H. Luu
ec7035c9d4
[ci] Make distributed 8 gpus test optional ( #29801 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-12-01 10:22:05 -08:00
Cyrus Leung
2afcec4dec
[Misc] Update TokenizerLike interface and move get_cached_tokenizer ( #29730 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-30 14:59:47 +08:00
Cyrus Leung
34a984274e
[Misc] Refactor tokenizer interface ( #29693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 04:02:21 -08:00
Angela Yi
4b17ce6815
Add gpu memory wait before test_async_tp ( #28893 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-28 20:19:05 -08:00
Isotr0py
d40c854009
[CI/Build] Rework CPU multimodal processor test ( #29684 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-28 17:10:29 +00:00
HDCharles
df01eda4dc
[Bugfix] Make compressed-tensors MoEs respect ignored layers ( #28878 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
2025-11-26 21:35:13 -05:00
Huamin Li
70d5953f82
Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )" ( #29483 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-26 22:27:26 +08:00
Harry Mellor
bf0c75cd4f
Make Transformers Nightly tests soft-fail and enable all tests ( #29401 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 12:41:15 +00:00
elvischenv
6330f9477d
[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-11-25 07:59:40 +00:00
Rémi Delacourt
12c007e288
EAGLE Support DP>1 ( #26086 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Signed-off-by: remi <remi@mistral.ai>
2025-11-25 07:32:21 +00:00
Varun Sundar Rabindranath
e924bbb4f4
[Build/CI][DP/EP] Add QWen/Qwen3-30B-A3B-FP8 + EPLB tests to Nightly H100 and B200 ( #29195 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-24 16:06:17 +00:00
Cyrus Leung
d1cf8214e5
[Bugfix] Use HF config fields as fallback when loading Mistral config ( #29239 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-22 11:22:48 -07:00
Wentao Ye
1f400c58b8
[CI] Add batch invariant test to ci ( #27842 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-21 09:20:33 -07:00
Michael Goin
986ab5db63
[CI Bugfix] Fix Kernels DeepGEMM Test (H100) ( #29106 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-20 16:42:33 -08:00
Alexander Matveev
3aaa94ac99
[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier ( #28687 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-19 15:47:13 -08:00
Shu Wang
613abb50d5
[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked ( #25990 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-19 13:29:06 -08:00
Copilot
61728cd1df
Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests ( #28966 )
...
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 13:32:19 -05:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 ( #28542 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
Yanan Cao
2c8b9182b5
[CI] Reorganize compile tests so new tests are automatically included in CI ( #28625 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-19 06:13:50 -08:00
Nick Hill
637f292196
[CI] Fix broken pipeline ( #28781 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-15 08:44:14 -08:00
Angela Yi
f36292dbee
[compile] Enable sequence parallelism matching w/o custom ops enabled ( #27126 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
2025-11-15 11:46:12 +00:00
Yanan Cao
262d263f6c
[Bugfix] Eliminate tuple inputs to submodules in graph partitioning ( #28533 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-13 15:09:05 -05:00
Nick Hill
8832fff972
[BugFix] Fix mm_encoder_attn_backend arg type checking ( #28599 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-13 03:06:03 +00:00
Harry Mellor
51c599f0ec
Skip models that cannot currently init on Transformers v5 ( #28471 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 23:43:57 +00:00
Harry Mellor
a742134cc5
Remove deprecated fields from CompilationConfig ( #27593 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 16:10:28 +00:00
Huamin Li
c748355e0d
[CI] Introduce autorun_on_main feature ( #27836 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-12 08:51:19 +00:00
zhrrr
68c09efc37
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model ( #27165 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
2025-11-11 12:00:31 -05:00
usberkeley
3143eb23fc
[BugFix] Add test_outputs.py to CI pipeline ( #28466 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-11 16:01:30 +00:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic ( #24794 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
Adrian Abeyta
a5a790eea6
[Bugfix] Ensure calculated KV scales are applied in attention. ( #27232 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com>
2025-11-10 23:42:37 +00:00
Ilya Markov
d17ecc6b19
[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds ( #24248 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-11-10 18:33:11 -05:00
Zhewen Li
a65a934ebe
[CI/Build] Temporary fix to LM Eval Small Models ( #28324 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-09 21:08:38 +00:00
Copilot
a736e5ff77
[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly ( #28074 )
2025-11-07 15:58:16 +08:00